Tidy Verbs for data.table • tidydt

Overview

tidydt is a toolkit of tidy data manipulation verbs with data.table as the backend . Combines the merits of syntax elegance from dplyr and computing performance from data.table, tidydt intends to provide users with state-of-the-art data manipulation tools with least pain. This package is inspired by maditr, but follows a different philosophy of design, such as prohibiting in place replacement and used a "_dt" suffix API. Also, tidydt would introduce more tidy data verbs from other packages, including but not limited to tidyverse and data.table. If you are a dplyr user but have to use data.table for speedy computation, or data.table user looking for readable coding syntax, tidydt is designed for you (and me of course). For further details and tutorials, see vignettes.

Enjoy the data science in tidydt !

Features

Always receives data.frame (tibble/data.table/data.frame) and returns a data.table.
Never use in place replacement.
Use suffix rather than prefix to increase the efficiency (especially when you have IDE with automatic code completion).
More verbs for big data manipulation.
Supporting data importing and parsing with fst, details see parse_fst, select_fst and filter_fst.
Flagship functions: group_dt, unnest_dt, mutate_when, etc.

Installation

devtools::install_github("hope-data-science/tidydt")

Example

library(tidydt)

iris %>%
  mutate_dt(group = Species,sl = Sepal.Length,sw = Sepal.Width) %>%
  select_dt(group,sl,sw) %>%
  filter_dt(sl > 5) %>%
  arrange_dt(group,sl) %>%
  distinct_dt(sl,.keep_all = T) %>%
  summarise_dt(sw = max(sw),by = group)
#>         group  sw
#> 1:     setosa 4.4
#> 2: versicolor 3.4
#> 3:  virginica 3.8

mtcars %>%
  group_dt(by =.(vs,am),
  summarise_dt(avg = mean(mpg)))
#>    vs am      avg
#> 1:  0  1 19.75000
#> 2:  1  1 28.37143
#> 3:  1  0 20.74286
#> 4:  0  0 15.05000

iris[3:8,] %>%
  mutate_when(Petal.Width == .2,
              one = 1,Sepal.Length=2)
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species one
#> 1:          2.0         3.2          1.3         0.2  setosa   1
#> 2:          2.0         3.1          1.5         0.2  setosa   1
#> 3:          2.0         3.6          1.4         0.2  setosa   1
#> 4:          5.4         3.9          1.7         0.4  setosa  NA
#> 5:          4.6         3.4          1.4         0.3  setosa  NA
#> 6:          2.0         3.4          1.5         0.2  setosa   1

Future plans

unnest_dt is now fast enough to beat the tidyr::unnest, but the nest_dt function would build a nested data.table with data.table inside. How to use such data structure is remained to be seen, and the performance is still to be explored.

Acknowledgement

The author of maditr, Gregory Demin and the author of fst, Marcus Klik have helped me a lot in the development of this work. It is so lucky to have them (and many other selfless contributors) in the same open source community of R.

tidydt: Tidy Verbs for data.table

Overview

Features

Installation

Example

Future plans

Acknowledgement

Links

License

Developers

tidydt: Tidy Verbs for data.table

Overview

Features

Installation

Example

Future plans

Related work

Acknowledgement

Links

License

Developers