R/keyword_group.R
keyword_group.Rd
Create a tbl_graph
(a class provided by tidygraph) from the tidy table with document ID and keyword.
Each entry(row) should contain only one keyword in the tidy format.This function would automatically computes
the frequency and classification group number of nodes representing keywords.
keyword_group(
dt,
id = "id",
keyword = "keyword",
top = 200,
min_freq = 1,
com_detect_fun = group_fast_greedy
)
A data.frame containing at least two columns with document ID and keyword.
Quoted characters specifying the column name of document ID.Default uses "id".
Quoted characters specifying the column name of keyword.Default uses "keyword".
The number of keywords selected with the largest frequency. If there is a tie,more than top entries would be selected.
Minimum occurrence of selected keywords.Default uses 1.
Community detection function,provided by tidygraph(wrappers around clustering
functions provided by igraph), see group_graph
to find other optional algorithms.
Default uses group_fast_greedy
.
A tbl_graph, representing the keyword co-occurence network with frequency and group number of the keywords.
This function receives a tidy table with document ID and keyword.Only top keywords with largest frequency would be selected and the minimum occurrence of keywords could be specified. For suggestions of community detection algorithm, see the references provided below.
de Sousa, Fabiano Berardo, and Liang Zhao. "Evaluating and comparing the igraph community detection algorithms." 2014 Brazilian Conference on Intelligent Systems. IEEE, 2014.
Yang, Z., Algesheimer, R., & Tessone, C. J. (2016). A comparative analysis of community detection algorithms on artificial networks. Scientific reports, 6, 30750.
library(akc)
# \donttest{
bibli_data_table %>%
keyword_clean(id = "id",keyword = "keyword") %>%
keyword_group(id = "id",keyword = "keyword")
#> # A tbl_graph: 203 nodes and 1223 edges
#> #
#> # An undirected simple graph with 1 component
#> #
#> # Node Data: 203 × 3 (active)
#> name freq group
#> <chr> <int> <int>
#> 1 information literacy 58 4
#> 2 academic libraries 133 1
#> 3 archives 12 4
#> 4 higher education 16 4
#> 5 bibliometrics 31 3
#> 6 assessment 15 2
#> # … with 197 more rows
#> #
#> # Edge Data: 1,223 × 3
#> from to n
#> <int> <int> <int>
#> 1 1 116 14
#> 2 1 2 12
#> 3 2 29 8
#> # … with 1,220 more rows
# use 'louvain' algorithm for community detection
bibli_data_table %>%
keyword_clean(id = "id",keyword = "keyword") %>%
keyword_group(id = "id",keyword = "keyword",
com_detect_fun = group_louvain)
#> # A tbl_graph: 203 nodes and 1223 edges
#> #
#> # An undirected simple graph with 1 component
#> #
#> # Node Data: 203 × 3 (active)
#> name freq group
#> <chr> <int> <int>
#> 1 information literacy 58 7
#> 2 academic libraries 133 4
#> 3 archives 12 2
#> 4 higher education 16 6
#> 5 bibliometrics 31 5
#> 6 assessment 15 7
#> # … with 197 more rows
#> #
#> # Edge Data: 1,223 × 3
#> from to n
#> <int> <int> <int>
#> 1 1 116 14
#> 2 1 2 12
#> 3 2 29 8
#> # … with 1,220 more rows
# get more alternatives by searching '?tidygraph::group_graph'
# }