This vignette has referred to dplyr’s vignette in https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html. We’ll try to reproduce all the results. First load the needed packages.

Arrange rows with arrange_dt()

Use - (minus symbol) to order a column in descending order:

Select columns with select_dt()

select_dt(flights, year:day) and select_dt(flights, -(year:day)) are not supported. But I have added a feature to help select with regular expression, which means you can:

The rename process is almost the same as that in dplyr:

select_dt(flights, tail_num = tailnum)
#>         tail_num
#>      1:   N14228
#>      2:   N24211
#>      3:   N619AA
#>      4:   N804JB
#>      5:   N668DN
#>     ---         
#> 336772:     <NA>
#> 336773:     <NA>
#> 336774:   N535MQ
#> 336775:   N511MQ
#> 336776:   N839MQ
rename_dt(flights, tail_num = tailnum)
#>         year month day dep_time sched_dep_time dep_delay arr_time
#>      1: 2013     1   1      517            515         2      830
#>      2: 2013     1   1      533            529         4      850
#>      3: 2013     1   1      542            540         2      923
#>      4: 2013     1   1      544            545        -1     1004
#>      5: 2013     1   1      554            600        -6      812
#>     ---                                                          
#> 336772: 2013     9  30       NA           1455        NA       NA
#> 336773: 2013     9  30       NA           2200        NA       NA
#> 336774: 2013     9  30       NA           1210        NA       NA
#> 336775: 2013     9  30       NA           1159        NA       NA
#> 336776: 2013     9  30       NA            840        NA       NA
#>         sched_arr_time arr_delay carrier flight tail_num origin dest air_time
#>      1:            819        11      UA   1545   N14228    EWR  IAH      227
#>      2:            830        20      UA   1714   N24211    LGA  IAH      227
#>      3:            850        33      AA   1141   N619AA    JFK  MIA      160
#>      4:           1022       -18      B6    725   N804JB    JFK  BQN      183
#>      5:            837       -25      DL    461   N668DN    LGA  ATL      116
#>     ---                                                                      
#> 336772:           1634        NA      9E   3393     <NA>    JFK  DCA       NA
#> 336773:           2312        NA      9E   3525     <NA>    LGA  SYR       NA
#> 336774:           1330        NA      MQ   3461   N535MQ    LGA  BNA       NA
#> 336775:           1344        NA      MQ   3572   N511MQ    LGA  CLE       NA
#> 336776:           1020        NA      MQ   3531   N839MQ    LGA  RDU       NA
#>         distance hour minute           time_hour
#>      1:     1400    5     15 2013-01-01 05:00:00
#>      2:     1416    5     29 2013-01-01 05:00:00
#>      3:     1089    5     40 2013-01-01 05:00:00
#>      4:     1576    5     45 2013-01-01 05:00:00
#>      5:      762    6      0 2013-01-01 06:00:00
#>     ---                                         
#> 336772:      213   14     55 2013-09-30 14:00:00
#> 336773:      198   22      0 2013-09-30 22:00:00
#> 336774:      764   12     10 2013-09-30 12:00:00
#> 336775:      419   11     59 2013-09-30 11:00:00
#> 336776:      431    8     40 2013-09-30 08:00:00

Add new columns with mutate_dt()

However, if you just create the column, please split them. The following codes would not work:

Instead, use:

mutate_dt(flights,gain = arr_delay - dep_delay) %>%
  mutate_dt(gain_per_hour = gain / (air_time / 60))
#>         year month day dep_time sched_dep_time dep_delay arr_time
#>      1: 2013     1   1      517            515         2      830
#>      2: 2013     1   1      533            529         4      850
#>      3: 2013     1   1      542            540         2      923
#>      4: 2013     1   1      544            545        -1     1004
#>      5: 2013     1   1      554            600        -6      812
#>     ---                                                          
#> 336772: 2013     9  30       NA           1455        NA       NA
#> 336773: 2013     9  30       NA           2200        NA       NA
#> 336774: 2013     9  30       NA           1210        NA       NA
#> 336775: 2013     9  30       NA           1159        NA       NA
#> 336776: 2013     9  30       NA            840        NA       NA
#>         sched_arr_time arr_delay carrier flight tailnum origin dest air_time
#>      1:            819        11      UA   1545  N14228    EWR  IAH      227
#>      2:            830        20      UA   1714  N24211    LGA  IAH      227
#>      3:            850        33      AA   1141  N619AA    JFK  MIA      160
#>      4:           1022       -18      B6    725  N804JB    JFK  BQN      183
#>      5:            837       -25      DL    461  N668DN    LGA  ATL      116
#>     ---                                                                     
#> 336772:           1634        NA      9E   3393    <NA>    JFK  DCA       NA
#> 336773:           2312        NA      9E   3525    <NA>    LGA  SYR       NA
#> 336774:           1330        NA      MQ   3461  N535MQ    LGA  BNA       NA
#> 336775:           1344        NA      MQ   3572  N511MQ    LGA  CLE       NA
#> 336776:           1020        NA      MQ   3531  N839MQ    LGA  RDU       NA
#>         distance hour minute           time_hour gain gain_per_hour
#>      1:     1400    5     15 2013-01-01 05:00:00    9      2.378855
#>      2:     1416    5     29 2013-01-01 05:00:00   16      4.229075
#>      3:     1089    5     40 2013-01-01 05:00:00   31     11.625000
#>      4:     1576    5     45 2013-01-01 05:00:00  -17     -5.573770
#>      5:      762    6      0 2013-01-01 06:00:00  -19     -9.827586
#>     ---                                                            
#> 336772:      213   14     55 2013-09-30 14:00:00   NA            NA
#> 336773:      198   22      0 2013-09-30 22:00:00   NA            NA
#> 336774:      764   12     10 2013-09-30 12:00:00   NA            NA
#> 336775:      419   11     59 2013-09-30 11:00:00   NA            NA
#> 336776:      431    8     40 2013-09-30 08:00:00   NA            NA

If you only want to keep the new variables, use transmute_dt():

Summarise values with summarise_dt()

summarise_dt(flights,
  delay = mean(dep_delay, na.rm = TRUE)
)
#>       delay
#> 1: 12.63907

Randomly sample rows with sample_n_dt() and sample_frac_dt()

sample_n_dt(flights, 10)
#>     year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
#>  1: 2013    12   9     1005           1010        -5     1200           1147
#>  2: 2013    10  18     1637           1645        -8     1754           1820
#>  3: 2013    10  13      916            925        -9     1015           1033
#>  4: 2013     5  25     1127           1129        -2     1219           1235
#>  5: 2013    12   2     1755           1800        -5     1937           1919
#>  6: 2013     3   8      849            820        29     1110            944
#>  7: 2013    12   7     1330           1246        44     1614           1538
#>  8: 2013     2  10     1518           1530       -12     1704           1711
#>  9: 2013     7  17     1256           1230        26     1614           1558
#> 10: 2013    11   9      705            710        -5      835            845
#>     arr_delay carrier flight tailnum origin dest air_time distance hour minute
#>  1:        13      UA    258  N460UA    EWR  ORD      139      719   10     10
#>  2:       -26      MQ   3216  N603MQ    JFK  ORF       51      290   16     45
#>  3:       -18      B6   1634  N203JB    JFK  BTV       44      266    9     25
#>  4:       -16      B6   1174  N354JB    EWR  BOS       37      200   11     29
#>  5:        18      US   2158  N955UW    LGA  BOS       35      184   18      0
#>  6:        86      9E   4051  N8932C    JFK  BWI       32      184    8     20
#>  7:        36      B6    383  N594JB    JFK  MCO      140      944   12     46
#>  8:        -7      9E   3719  N8974C    LGA  RIC       53      292   15     30
#>  9:        16      DL   2098  N322NB    LGA  MIA      155     1096   12     30
#> 10:       -10      AA    305  N4YFAA    LGA  ORD      132      733    7     10
#>               time_hour
#>  1: 2013-12-09 10:00:00
#>  2: 2013-10-18 16:00:00
#>  3: 2013-10-13 09:00:00
#>  4: 2013-05-25 11:00:00
#>  5: 2013-12-02 18:00:00
#>  6: 2013-03-08 08:00:00
#>  7: 2013-12-07 12:00:00
#>  8: 2013-02-10 15:00:00
#>  9: 2013-07-17 12:00:00
#> 10: 2013-11-09 07:00:00
sample_frac_dt(flights, 0.01)
#>       year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
#>    1: 2013     3   7     2006           1955        11     2310           2310
#>    2: 2013     3  27     1441           1445        -4     1657           1710
#>    3: 2013     6   8     1722           1725        -3     1928           1947
#>    4: 2013     1   6     1846           1855        -9     2036           2100
#>    5: 2013     1  29     1109           1114        -5     1316           1315
#>   ---                                                                         
#> 3363: 2013     6  27     1807           1726        41     2108           2009
#> 3364: 2013    11   5      848            853        -5     1150           1207
#> 3365: 2013     9  27     1210           1210         0     1338           1330
#> 3366: 2013     6  27     1239           1229        10     1414           1351
#> 3367: 2013     6  30      942            945        -3     1106           1120
#>       arr_delay carrier flight tailnum origin dest air_time distance hour
#>    1:         0      AA   1709  N3HFAA    LGA  MIA      150     1096   19
#>    2:       -13      MQ   4669  N512MQ    LGA  ATL      106      762   14
#>    3:       -19      UA    280  N461UA    EWR  PHX      279     2133   17
#>    4:       -24      MQ   4649  N537MQ    LGA  MSP      151     1020   18
#>    5:         1      DL   1031  N361NB    LGA  DTW      100      502   11
#>   ---                                                                    
#> 3363:        59      UA   1593  N77430    EWR  LAS      321     2227   17
#> 3364:       -17      UA    354  N426UA    EWR  IAH      206     1400    8
#> 3365:         8      MQ   3461  N527MQ    LGA  BNA       97      764   12
#> 3366:        23      B6   2502  N348JB    JFK  BUF       72      301   12
#> 3367:       -14      WN   2431  N246LV    LGA  MDW      109      725    9
#>       minute           time_hour
#>    1:     55 2013-03-07 19:00:00
#>    2:     45 2013-03-27 14:00:00
#>    3:     25 2013-06-08 17:00:00
#>    4:     55 2013-01-06 18:00:00
#>    5:     14 2013-01-29 11:00:00
#>   ---                           
#> 3363:     26 2013-06-27 17:00:00
#> 3364:     53 2013-11-05 08:00:00
#> 3365:     10 2013-09-27 12:00:00
#> 3366:     29 2013-06-27 12:00:00
#> 3367:     45 2013-06-30 09:00:00

Grouped operations

For the below dplyr codes:

by_tailnum <- group_by(flights, tailnum)
delay <- summarise(by_tailnum,
  count = n(),
  dist = mean(distance, na.rm = TRUE),
  delay = mean(arr_delay, na.rm = TRUE))
delay <- filter(delay, count > 20, dist < 2000)

We could get it via:

summarise_dt (or summarize_dt) has a parameter “by”, you can specify the group. We could find the number of planes and the number of flights that go to each possible destination:

If you need to group by many variables, use: