tidyfst is a toolkit of tidy data manipulation verbs with data.table as the backend . Combining the merits of syntax elegance from dplyr and computing performance from data.table, tidyfst intends to provide users with state-of-the-art data manipulation tools with least pain. This package is an extension of data.table, while enjoying a tidy syntax, it also wraps combinations of efficient functions to facilitate frequently-used data operations. Also, tidyfst would introduce more tidy data verbs from other packages, including but not limited to tidyverse and data.table. If you are a dplyr user but have to use data.table for speedy computation, or data.table user looking for readable coding syntax, tidyfst is designed for you (and me of course). For further details and tutorials, see vignettes. Both Chinese and English tutorials could be found there.
Till now, tidyfst has an API that might even transcend its predecessors (e.g.
select_dt could accept nearly anything for super column selection). Enjoy the efficient data operations in tidyfst !
PS: For extreme performance in tidy syntax, try tidyfst’s mirror package tidyft.
library(tidyfst) iris %>% mutate_dt(group = Species,sl = Sepal.Length,sw = Sepal.Width) %>% select_dt(group,sl,sw) %>% filter_dt(sl > 5) %>% arrange_dt(group,sl) %>% distinct_dt(sl,.keep_all = T) %>% summarise_dt(sw = max(sw),by = group) #> group sw #> <fctr> <num> #> 1: setosa 4.4 #> 2: versicolor 3.4 #> 3: virginica 3.8 mtcars %>% group_dt( by =.(vs,am), summarise_dt(avg = mean(mpg)) ) #> vs am avg #> <num> <num> <num> #> 1: 0 1 19.75000 #> 2: 1 1 28.37143 #> 3: 1 0 20.74286 #> 4: 0 0 15.05000 iris[3:8,] %>% mutate_when(Petal.Width == .2, one = 1,Sepal.Length=2) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species one #> <num> <num> <num> <num> <fctr> <num> #> 1: 2.0 3.2 1.3 0.2 setosa 1 #> 2: 2.0 3.1 1.5 0.2 setosa 1 #> 3: 2.0 3.6 1.4 0.2 setosa 1 #> 4: 5.4 3.9 1.7 0.4 setosa NA #> 5: 4.6 3.4 1.4 0.3 setosa NA #> 6: 2.0 3.4 1.5 0.2 setosa 1
tidyfst will keep up with the updates of data.table , in the next step would introduce more new features to improve the performance and flexibility to facilitate fast data manipulation in tidy syntax.