Based on community detection to automatically classify the keywords, can utilize different algorithms for clustering. In this vignette, a benchmark is provided to show the difference for various algorithms on multiple sizes of networks.
First, we’ll load the needed packages.
Then, we prepare the needed data. The built-in data table
biblio_data_table
would be used here.
bibli_data_table %>%
keyword_clean() %>%
keyword_merge() -> clean_data
Next, a combination of network size and community detection algorithms are designed to be tested:
100:300 -> topn_sample
ls("package:akc") %>%
str_extract("^group.+") %>%
na.omit() %>%
setdiff(c("group_biconnected_component",
"group_components",
"group_optimal")) -> com_detect_fun_list
Finally, we’ll implement the computation and record the results.
all = tibble()
for(i in com_detect_fun_list){
for(j in topn_sample){
system.time({
clean_data %>%
keyword_group(top = j,com_detect_fun = get(i)) %>%
as_tibble -> grouped_network_table
}) %>% na.omit-> time_info
grouped_network_table %>% nrow -> node_no
grouped_network_table %>% distinct(group) %>% nrow -> group_no
grouped_network_table %>%
count(group) %>%
summarise(mean(n)) %>%
.[[1]] -> group_avg_node_no
grouped_network_table %>%
count(group) %>%
summarise(sd(n)) %>%
.[[1]] -> group_sd_node_no
c(com_detect_fun = i,
topn = j,
node_no = node_no,group_no = group_no,
avg = group_avg_node_no,
sd = group_sd_node_no,time_info[1:3]) %>%
bind_rows(all,.) -> all
}
}
res = all %>%
mutate_at(2:9,function(x) as.numeric(x) %>% round(2)) %>%
distinct(com_detect_fun,node_no,.keep_all = T) %>%
select(-topn,-contains("self")) %>%
setNames(c("com_detect_fun","No. of total nodes","No. of total groups",
"Average node number in each group","Standard deviation of node number",
"Computer running time for keyword_group function"))
The results are displayed in the following table.
knitr::kable(res)
com_detect_fun | No. of total nodes | No. of total groups | Average node number in each group | Standard deviation of node number | Computer running time for keyword_group function |
---|---|---|---|---|---|
group_edge_betweenness | 103 | 36 | 2.86 | 9.17 | 0.50 |
group_edge_betweenness | 207 | 68 | 3.04 | 12.53 | 2.98 |
group_edge_betweenness | 326 | 89 | 3.66 | 13.12 | 10.03 |
group_fast_greedy | 103 | 5 | 20.60 | 8.17 | 0.17 |
group_fast_greedy | 207 | 5 | 41.40 | 24.36 | 0.18 |
group_fast_greedy | 326 | 6 | 54.33 | 34.77 | 0.19 |
group_infomap | 103 | 1 | 103.00 | NA | 0.17 |
group_infomap | 207 | 4 | 51.75 | 94.83 | 0.22 |
group_infomap | 326 | 6 | 54.33 | 114.98 | 0.34 |
group_label_prop | 103 | 1 | 103.00 | NA | 0.16 |
group_label_prop | 207 | 1 | 207.00 | NA | 0.17 |
group_label_prop | 326 | 1 | 326.00 | NA | 0.18 |
group_leading_eigen | 103 | 4 | 25.75 | 9.57 | 0.17 |
group_leading_eigen | 207 | 5 | 41.40 | 19.19 | 0.18 |
group_leading_eigen | 326 | 7 | 46.57 | 35.15 | 0.22 |
group_louvain | 103 | 5 | 20.60 | 12.14 | 0.16 |
group_louvain | 207 | 8 | 25.88 | 14.11 | 0.17 |
group_louvain | 326 | 9 | 36.22 | 19.08 | 0.18 |
group_spinglass | 103 | 5 | 20.60 | 5.13 | 1.66 |
group_spinglass | 207 | 8 | 25.88 | 13.38 | 4.04 |
group_spinglass | 326 | 8 | 40.75 | 12.07 | 7.30 |
group_walktrap | 103 | 103 | 1.00 | 0.00 | 0.16 |
group_walktrap | 207 | 207 | 1.00 | 0.00 | 0.17 |
group_walktrap | 326 | 326 | 1.00 | 0.00 | 0.17 |
The session information is displayed as below:
sessionInfo()
#> R version 4.2.2 (2022-10-31 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19044)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=Chinese (Simplified)_China.utf8
#> [2] LC_CTYPE=Chinese (Simplified)_China.utf8
#> [3] LC_MONETARY=Chinese (Simplified)_China.utf8
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=Chinese (Simplified)_China.utf8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> loaded via a namespace (and not attached):
#> [1] rstudioapi_0.14 knitr_1.41 magrittr_2.0.3 R6_2.5.1
#> [5] ragg_1.2.4 rlang_1.0.6 fastmap_1.1.0 highr_0.10
#> [9] stringr_1.5.0 tools_4.2.2 xfun_0.36 cli_3.5.0
#> [13] jquerylib_0.1.4 systemfonts_1.0.4 htmltools_0.5.4 yaml_2.3.6
#> [17] digest_0.6.31 rprojroot_2.0.3 lifecycle_1.0.3 pkgdown_2.0.7
#> [21] textshaping_0.3.6 purrr_1.0.0 sass_0.4.4 vctrs_0.5.1
#> [25] fs_1.5.2 memoise_2.0.1 glue_1.6.2 cachem_1.0.6
#> [29] evaluate_0.19 rmarkdown_2.19 stringi_1.7.8 compiler_4.2.2
#> [33] bslib_0.4.2 desc_1.4.2 jsonlite_1.8.4