Based on community detection to automatically classify the keywords, can utilize different algorithms for clustering. In this vignette, a benchmark is provided to show the difference for various algorithms on multiple sizes of networks.

First, we’ll load the needed packages.

Then, we prepare the needed data. The built-in data table biblio_data_table would be used here.

bibli_data_table %>% 
  keyword_clean() %>% 
  keyword_merge() -> clean_data

Next, a combination of network size and community detection algorithms are designed to be tested:

100:300 -> topn_sample
ls("package:akc") %>% 
  str_extract("^group.+") %>% 
  na.omit() %>% 
  setdiff(c("group_biconnected_component",
            "group_components",
            "group_optimal")) -> com_detect_fun_list

Finally, we’ll implement the computation and record the results.

all = tibble()
for(i in com_detect_fun_list){
    for(j in topn_sample){
      system.time({
        clean_data %>% 
          keyword_group(top = j,com_detect_fun = get(i)) %>% 
          as_tibble -> grouped_network_table
      }) %>% na.omit-> time_info
      grouped_network_table %>% nrow -> node_no
      grouped_network_table %>% distinct(group) %>% nrow -> group_no
      grouped_network_table %>% 
        count(group) %>% 
        summarise(mean(n)) %>% 
        .[[1]] -> group_avg_node_no
      grouped_network_table %>% 
        count(group) %>% 
        summarise(sd(n)) %>% 
        .[[1]] -> group_sd_node_no
      c(com_detect_fun = i, 
        topn = j,
        node_no = node_no,group_no = group_no,
        avg = group_avg_node_no,
        sd = group_sd_node_no,time_info[1:3]) %>% 
        bind_rows(all,.) -> all
    }
}

res = all %>% 
  mutate_at(2:9,function(x) as.numeric(x) %>% round(2)) %>% 
  distinct(com_detect_fun,node_no,.keep_all = T) %>% 
  select(-topn,-contains("self")) %>% 
  setNames(c("com_detect_fun","No. of total nodes","No. of total groups",
             "Average node number in each group","Standard deviation of node number",
             "Computer running time for keyword_group function")) 

The results are displayed in the following table.

knitr::kable(res)
com_detect_fun No. of total nodes No. of total groups Average node number in each group Standard deviation of node number Computer running time for keyword_group function
group_edge_betweenness 103 36 2.86 9.17 0.50
group_edge_betweenness 207 68 3.04 12.53 2.98
group_edge_betweenness 326 89 3.66 13.12 10.03
group_fast_greedy 103 5 20.60 8.17 0.17
group_fast_greedy 207 5 41.40 24.36 0.18
group_fast_greedy 326 6 54.33 34.77 0.19
group_infomap 103 1 103.00 NA 0.17
group_infomap 207 4 51.75 94.83 0.22
group_infomap 326 6 54.33 114.98 0.34
group_label_prop 103 1 103.00 NA 0.16
group_label_prop 207 1 207.00 NA 0.17
group_label_prop 326 1 326.00 NA 0.18
group_leading_eigen 103 4 25.75 9.57 0.17
group_leading_eigen 207 5 41.40 19.19 0.18
group_leading_eigen 326 7 46.57 35.15 0.22
group_louvain 103 5 20.60 12.14 0.16
group_louvain 207 8 25.88 14.11 0.17
group_louvain 326 9 36.22 19.08 0.18
group_spinglass 103 5 20.60 5.13 1.66
group_spinglass 207 8 25.88 13.38 4.04
group_spinglass 326 8 40.75 12.07 7.30
group_walktrap 103 103 1.00 0.00 0.16
group_walktrap 207 207 1.00 0.00 0.17
group_walktrap 326 326 1.00 0.00 0.17

The session information is displayed as below:

sessionInfo()
#> R version 4.2.2 (2022-10-31 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19044)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=Chinese (Simplified)_China.utf8 
#> [2] LC_CTYPE=Chinese (Simplified)_China.utf8   
#> [3] LC_MONETARY=Chinese (Simplified)_China.utf8
#> [4] LC_NUMERIC=C                               
#> [5] LC_TIME=Chinese (Simplified)_China.utf8    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] rstudioapi_0.14   knitr_1.41        magrittr_2.0.3    R6_2.5.1         
#>  [5] ragg_1.2.4        rlang_1.0.6       fastmap_1.1.0     highr_0.10       
#>  [9] stringr_1.5.0     tools_4.2.2       xfun_0.36         cli_3.5.0        
#> [13] jquerylib_0.1.4   systemfonts_1.0.4 htmltools_0.5.4   yaml_2.3.6       
#> [17] digest_0.6.31     rprojroot_2.0.3   lifecycle_1.0.3   pkgdown_2.0.7    
#> [21] textshaping_0.3.6 purrr_1.0.0       sass_0.4.4        vctrs_0.5.1      
#> [25] fs_1.5.2          memoise_2.0.1     glue_1.6.2        cachem_1.0.6     
#> [29] evaluate_0.19     rmarkdown_2.19    stringi_1.7.8     compiler_4.2.2   
#> [33] bslib_0.4.2       desc_1.4.2        jsonlite_1.8.4