Subset of 'dplyr' verbs to work with data.table. Note that there is no group_by verb - use by or keyby argument when needed.

  • dt_mutate adds new variables or modify existing variables. If data is data.table then it modifies in-place.

  • dt_summarize computes summary statistics. Splits the data into subsets, computes summary statistics for each, and returns the result in the "data.table" form.

  • dt_summarize_all the same as dt_summarize but work over all non-grouping variables.

  • dt_filter Selects rows/cases where conditions are true. Rows where the condition evaluates to NA are dropped.

  • dt_select Selects column/variables from the data set.

  • dt_arrange sorts dataset by variable(-s). Use '-' to sort in descending order. If data is data.table then it modifies in-place.

dt_mutate(data, ..., by)

dt_summarize(data, ..., by, keyby, fun = NULL)

dt_summarize_all(data, fun, by, keyby)

dt_summarise(data, ..., by, keyby, fun = NULL)

dt_summarise_all(data, fun, by, keyby)

dt_select(data, ...)

dt_filter(data, ...)

dt_arrange(data, ..., na.last = FALSE)

Arguments

data

data.table/data.frame data.frame will be automatically converted to data.table. dt_mutate, dt_mutate_if, dt_mutate_if modify data.table object in-place.

...

List of variables or name-value pairs of summary/modifications functions. The name will be the name of the variable in the result. In the mutate function we can use a = b or a := b notation. Advantages of := are multiassignment (c("a", "b") := list(1,2)) and parametric assignment ((a) := 2).

by

unquoted name of grouping variable of list of unquoted names of grouping variables. For details see data.table

keyby

Same as by, but with an additional setkey() run on the by columns of the result, for convenience. It is common practice to use 'keyby=' routinely when you wish the result to be sorted. For details see data.table.

fun

function which will be applied to all variables in dt_summarize and dt_summarize_all.

na.last

logical. FALSE by default. If TRUE, missing values in the data are put last; if FALSE, they are put first.

Examples

# examples from 'dplyr' # newly created variables are available immediately mtcars %>% dt_mutate( cyl2 = cyl * 2, cyl4 = cyl2 * 2 ) %>% head()
#> mpg cyl disp hp drat wt qsec vs am gear carb cyl2 cyl4 #> 1: 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 12 24 #> 2: 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 12 24 #> 3: 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 8 16 #> 4: 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 12 24 #> 5: 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 16 32 #> 6: 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 12 24
# you can also use dt_mutate() to remove variables and # modify existing variables mtcars %>% dt_mutate( mpg = NULL, disp = disp * 0.0163871 # convert to litres ) %>% head()
#> cyl disp hp drat wt qsec vs am gear carb #> 1: 6 2.621936 110 3.90 2.620 16.46 0 1 4 4 #> 2: 6 2.621936 110 3.90 2.875 17.02 0 1 4 4 #> 3: 4 1.769807 93 3.85 2.320 18.61 1 1 4 1 #> 4: 6 4.227872 110 3.08 3.215 19.44 1 0 3 1 #> 5: 8 5.899356 175 3.15 3.440 17.02 0 0 3 2 #> 6: 6 3.687098 105 2.76 3.460 20.22 1 0 3 1
# window functions are useful for grouped mutates mtcars %>% dt_mutate( rank = rank(-mpg, ties.method = "min"), keyby = cyl) %>% print()
#> mpg cyl disp hp drat wt qsec vs am gear carb rank #> 1: 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 8 #> 2: 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 7 #> 3: 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 8 #> 4: 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 2 #> 5: 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 3 #> 6: 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 1 #> 7: 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 10 #> 8: 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 5 #> 9: 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 6 #> 10: 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 3 #> 11: 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 11 #> 12: 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 2 #> 13: 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 2 #> 14: 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 1 #> 15: 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 6 #> 16: 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 5 #> 17: 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 7 #> 18: 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 4 #> 19: 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 2 #> 20: 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 11 #> 21: 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 4 #> 22: 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 3 #> 23: 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 7 #> 24: 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 13 #> 25: 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 13 #> 26: 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 10 #> 27: 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 6 #> 28: 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 7 #> 29: 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 12 #> 30: 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 1 #> 31: 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 5 #> 32: 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 9 #> mpg cyl disp hp drat wt qsec vs am gear carb rank
# You can drop variables by setting them to NULL mtcars %>% dt_mutate(cyl = NULL) %>% head()
#> mpg disp hp drat wt qsec vs am gear carb #> 1: 21.0 160 110 3.90 2.620 16.46 0 1 4 4 #> 2: 21.0 160 110 3.90 2.875 17.02 0 1 4 4 #> 3: 22.8 108 93 3.85 2.320 18.61 1 1 4 1 #> 4: 21.4 258 110 3.08 3.215 19.44 1 0 3 1 #> 5: 18.7 360 175 3.15 3.440 17.02 0 0 3 2 #> 6: 18.1 225 105 2.76 3.460 20.22 1 0 3 1
# A summary applied without by returns a single row mtcars %>% dt_summarise(mean = mean(disp), n = .N)
#> mean n #> 1: 230.7219 32
# Usually, you'll want to group first mtcars %>% dt_summarise(mean = mean(disp), n = .N, by = cyl)
#> cyl mean n #> 1: 6 183.3143 7 #> 2: 4 105.1364 11 #> 3: 8 353.1000 14
# Multiple 'by' - variables mtcars %>% dt_summarise(cyl_n = .N, by = list(cyl, vs))
#> cyl vs cyl_n #> 1: 6 0 3 #> 2: 4 1 10 #> 3: 6 1 4 #> 4: 8 0 14 #> 5: 4 0 1
# Newly created summaries immediately # doesn't overwrite existing variables mtcars %>% dt_summarise(disp = mean(disp), sd = sd(disp), by = cyl)
#> cyl disp sd #> 1: 6 183.3143 41.56246 #> 2: 4 105.1364 26.87159 #> 3: 8 353.1000 67.77132
# You can group by expressions: mtcars %>% dt_summarise_all(mean, by = list(vsam = vs + am))
#> vsam mpg cyl disp hp drat wt qsec #> 1: 1 20.28462 5.692308 189.4692 138.46154 3.738462 3.038846 18.04231 #> 2: 2 28.37143 4.000000 89.8000 80.57143 4.148571 2.028286 18.70000 #> 3: 0 15.05000 8.000000 357.6167 194.16667 3.120833 4.104083 17.14250 #> gear carb #> 1: 4.076923 3.307692 #> 2: 4.142857 1.428571 #> 3: 3.000000 3.083333
# filter by condition mtcars %>% dt_filter(am==0)
#> mpg cyl disp hp drat wt qsec vs am gear carb #> 1: 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 #> 2: 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 #> 3: 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 #> 4: 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 #> 5: 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 #> 6: 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 #> 7: 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 #> 8: 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 #> 9: 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 #> 10: 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 #> 11: 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 #> 12: 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 #> 13: 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 #> 14: 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 #> 15: 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 #> 16: 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 #> 17: 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 #> 18: 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 #> 19: 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
# filter by compound condition mtcars %>% dt_filter(am==0, mpg>mean(mpg))
#> mpg cyl disp hp drat wt qsec vs am gear carb #> 1: 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 #> 2: 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 #> 3: 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 #> 4: 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
# select mtcars %>% dt_select(vs:carb, cyl)
#> vs am gear carb cyl #> 1: 0 1 4 4 6 #> 2: 0 1 4 4 6 #> 3: 1 1 4 1 4 #> 4: 1 0 3 1 6 #> 5: 0 0 3 2 8 #> 6: 1 0 3 1 6 #> 7: 0 0 3 4 8 #> 8: 1 0 4 2 4 #> 9: 1 0 4 2 4 #> 10: 1 0 4 4 6 #> 11: 1 0 4 4 6 #> 12: 0 0 3 3 8 #> 13: 0 0 3 3 8 #> 14: 0 0 3 3 8 #> 15: 0 0 3 4 8 #> 16: 0 0 3 4 8 #> 17: 0 0 3 4 8 #> 18: 1 1 4 1 4 #> 19: 1 1 4 2 4 #> 20: 1 1 4 1 4 #> 21: 1 0 3 1 4 #> 22: 0 0 3 2 8 #> 23: 0 0 3 2 8 #> 24: 0 0 3 4 8 #> 25: 0 0 3 2 8 #> 26: 1 1 4 1 4 #> 27: 0 1 5 2 4 #> 28: 1 1 5 2 4 #> 29: 0 1 5 4 8 #> 30: 0 1 5 6 6 #> 31: 0 1 5 8 8 #> 32: 1 1 4 2 4 #> vs am gear carb cyl
mtcars %>% dt_select(-am, -cyl)
#> mpg disp hp drat wt qsec vs gear carb #> 1: 21.0 160.0 110 3.90 2.620 16.46 0 4 4 #> 2: 21.0 160.0 110 3.90 2.875 17.02 0 4 4 #> 3: 22.8 108.0 93 3.85 2.320 18.61 1 4 1 #> 4: 21.4 258.0 110 3.08 3.215 19.44 1 3 1 #> 5: 18.7 360.0 175 3.15 3.440 17.02 0 3 2 #> 6: 18.1 225.0 105 2.76 3.460 20.22 1 3 1 #> 7: 14.3 360.0 245 3.21 3.570 15.84 0 3 4 #> 8: 24.4 146.7 62 3.69 3.190 20.00 1 4 2 #> 9: 22.8 140.8 95 3.92 3.150 22.90 1 4 2 #> 10: 19.2 167.6 123 3.92 3.440 18.30 1 4 4 #> 11: 17.8 167.6 123 3.92 3.440 18.90 1 4 4 #> 12: 16.4 275.8 180 3.07 4.070 17.40 0 3 3 #> 13: 17.3 275.8 180 3.07 3.730 17.60 0 3 3 #> 14: 15.2 275.8 180 3.07 3.780 18.00 0 3 3 #> 15: 10.4 472.0 205 2.93 5.250 17.98 0 3 4 #> 16: 10.4 460.0 215 3.00 5.424 17.82 0 3 4 #> 17: 14.7 440.0 230 3.23 5.345 17.42 0 3 4 #> 18: 32.4 78.7 66 4.08 2.200 19.47 1 4 1 #> 19: 30.4 75.7 52 4.93 1.615 18.52 1 4 2 #> 20: 33.9 71.1 65 4.22 1.835 19.90 1 4 1 #> 21: 21.5 120.1 97 3.70 2.465 20.01 1 3 1 #> 22: 15.5 318.0 150 2.76 3.520 16.87 0 3 2 #> 23: 15.2 304.0 150 3.15 3.435 17.30 0 3 2 #> 24: 13.3 350.0 245 3.73 3.840 15.41 0 3 4 #> 25: 19.2 400.0 175 3.08 3.845 17.05 0 3 2 #> 26: 27.3 79.0 66 4.08 1.935 18.90 1 4 1 #> 27: 26.0 120.3 91 4.43 2.140 16.70 0 5 2 #> 28: 30.4 95.1 113 3.77 1.513 16.90 1 5 2 #> 29: 15.8 351.0 264 4.22 3.170 14.50 0 5 4 #> 30: 19.7 145.0 175 3.62 2.770 15.50 0 5 6 #> 31: 15.0 301.0 335 3.54 3.570 14.60 0 5 8 #> 32: 21.4 121.0 109 4.11 2.780 18.60 1 4 2 #> mpg disp hp drat wt qsec vs gear carb
# sorting dt_arrange(mtcars, cyl, disp)
#> mpg cyl disp hp drat wt qsec vs am gear carb #> 1: 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 #> 2: 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 #> 3: 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 #> 4: 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 #> 5: 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 #> 6: 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 #> 7: 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 #> 8: 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 #> 9: 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 #> 10: 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 #> 11: 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 #> 12: 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 #> 13: 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 #> 14: 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 #> 15: 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 #> 16: 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 #> 17: 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 #> 18: 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 #> 19: 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 #> 20: 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 #> 21: 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 #> 22: 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 #> 23: 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 #> 24: 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 #> 25: 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 #> 26: 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 #> 27: 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 #> 28: 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 #> 29: 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 #> 30: 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 #> 31: 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 #> 32: 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 #> mpg cyl disp hp drat wt qsec vs am gear carb
dt_arrange(mtcars, -disp)
#> mpg cyl disp hp drat wt qsec vs am gear carb #> 1: 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 #> 2: 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 #> 3: 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 #> 4: 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 #> 5: 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 #> 6: 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 #> 7: 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 #> 8: 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 #> 9: 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 #> 10: 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 #> 11: 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 #> 12: 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 #> 13: 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 #> 14: 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 #> 15: 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 #> 16: 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 #> 17: 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 #> 18: 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 #> 19: 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 #> 20: 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 #> 21: 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 #> 22: 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 #> 23: 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 #> 24: 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 #> 25: 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 #> 26: 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 #> 27: 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 #> 28: 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 #> 29: 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 #> 30: 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 #> 31: 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 #> 32: 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 #> mpg cyl disp hp drat wt qsec vs am gear carb