Carry out data manipulation within specified groups. Different from group_dt
,
the implementation is split into two operations, namely grouping and implementation.
Using setkey
and setkeyv
in data.table
to carry out group_by
-like functionalities in dplyr. This is
not only convenient but also efficient in computation.
group_by_dt(.data, ..., cols = NULL)
group_exe_dt(.data, ...)
A data frame
Variables to group by for group_by_dt
,
namely the columns to sort by. Do not quote the column names.
Any data manipulation arguments that could be
implemented on a data.frame for group_exe_dt
.
It can receive what select_dt
receives.
A character vector of column names to group by.
A data.table with keys
group_by_dt
and group_exe_dt
are a pair of functions
to be used in combination. It utilizes the feature of key setting in data.table,
which provides high performance for group operations, especially when you have
to operate by specific groups frequently.
# aggregation after grouping using group_exe_dt
as.data.table(iris) -> a
a %>%
group_by_dt(Species) %>%
group_exe_dt(head(1))
#> Key: <Species>
#> Species Sepal.Length Sepal.Width Petal.Length Petal.Width
#> <fctr> <num> <num> <num> <num>
#> 1: setosa 5.1 3.5 1.4 0.2
#> 2: versicolor 7.0 3.2 4.7 1.4
#> 3: virginica 6.3 3.3 6.0 2.5
a %>%
group_by_dt(Species) %>%
group_exe_dt(
head(3) %>%
summarise_dt(sum = sum(Sepal.Length))
)
#> Key: <Species>
#> Species sum
#> <fctr> <num>
#> 1: setosa 14.7
#> 2: versicolor 20.3
#> 3: virginica 19.2
mtcars %>%
group_by_dt("cyl|am") %>%
group_exe_dt(
summarise_dt(mpg_sum = sum(mpg))
)
#> Key: <cyl, am>
#> cyl am mpg_sum
#> <num> <num> <num>
#> 1: 4 0 68.7
#> 2: 4 1 224.6
#> 3: 6 0 76.5
#> 4: 6 1 61.7
#> 5: 8 0 180.6
#> 6: 8 1 30.8
# equals to
mtcars %>%
group_by_dt(cols = c("cyl","am")) %>%
group_exe_dt(
summarise_dt(mpg_sum = sum(mpg))
)
#> Key: <cyl, am>
#> cyl am mpg_sum
#> <num> <num> <num>
#> 1: 4 0 68.7
#> 2: 4 1 224.6
#> 3: 6 0 76.5
#> 4: 6 1 61.7
#> 5: 8 0 180.6
#> 6: 8 1 30.8