Group by variable(s) and implement operations

Carry out data manipulation within specified groups. Different from group_dt, the implementation is split into two operations, namely grouping and implementation.

Using setkey and setkeyv in data.table to carry out group_by-like functionalities in dplyr. This is not only convenient but also efficient in computation.

group_by_dt(.data, ..., cols = NULL)

group_exe_dt(.data, ...)

Arguments

.data: A data frame
...: Variables to group by for group_by_dt, namely the columns to sort by. Do not quote the column names. Any data manipulation arguments that could be implemented on a data.frame for group_exe_dt. It can receive what select_dt receives.
cols: A character vector of column names to group by.

Value

A data.table with keys

Details

group_by_dt and group_exe_dt are a pair of functions to be used in combination. It utilizes the feature of key setting in data.table, which provides high performance for group operations, especially when you have to operate by specific groups frequently.

Examples


# aggregation after grouping using group_exe_dt
as.data.table(iris) -> a
a %>%
  group_by_dt(Species) %>%
  group_exe_dt(head(1))
#> Key: <Species>
#>       Species Sepal.Length Sepal.Width Petal.Length Petal.Width
#>        <fctr>        <num>       <num>        <num>       <num>
#> 1:     setosa          5.1         3.5          1.4         0.2
#> 2: versicolor          7.0         3.2          4.7         1.4
#> 3:  virginica          6.3         3.3          6.0         2.5

a %>%
  group_by_dt(Species) %>%
  group_exe_dt(
    head(3) %>%
      summarise_dt(sum = sum(Sepal.Length))
  )
#> Key: <Species>
#>       Species   sum
#>        <fctr> <num>
#> 1:     setosa  14.7
#> 2: versicolor  20.3
#> 3:  virginica  19.2

mtcars %>%
  group_by_dt("cyl|am") %>%
  group_exe_dt(
    summarise_dt(mpg_sum = sum(mpg))
  )
#> Key: <cyl, am>
#>      cyl    am mpg_sum
#>    <num> <num>   <num>
#> 1:     4     0    68.7
#> 2:     4     1   224.6
#> 3:     6     0    76.5
#> 4:     6     1    61.7
#> 5:     8     0   180.6
#> 6:     8     1    30.8
# equals to
mtcars %>%
  group_by_dt(cols = c("cyl","am")) %>%
  group_exe_dt(
    summarise_dt(mpg_sum = sum(mpg))
  )
#> Key: <cyl, am>
#>      cyl    am mpg_sum
#>    <num> <num>   <num>
#> 1:     4     0    68.7
#> 2:     4     1   224.6
#> 3:     6     0    76.5
#> 4:     6     1    61.7
#> 5:     8     0   180.6
#> 6:     8     1    30.8