Quickly create dummy (binary) columns from character and factor type columns in the inputted data (and numeric columns if specified.) This function is useful for statistical analysis when you want binary columns rather than character columns.

dummy_dt(.data, ..., longname = TRUE)

Arguments

.data

data.frame

...

Columns you want to create dummy variables from. Very flexible, find in the examples.

longname

logical. Should the output column labeled with the original column name? Default uses TRUE.

Value

data.table

Details

If no columns provided, will return the original data frame. When NA exist in the input column, they would also be considered. If the input character column contains both NA and string "NA", they would be merged.

This function is inspired by fastDummies package, but provides simple and precise usage, whereas fastDummies::dummy_cols provides more features for statistical usage.

References

https://stackoverflow.com/questions/18881073/creating-dummy-variables-in-r-data-table

See also

Examples

iris %>% dummy_dt(Species)
#>      Sepal.Length Sepal.Width Petal.Length Petal.Width   Species Species_setosa
#>             <num>       <num>        <num>       <num>    <fctr>          <num>
#>   1:          5.1         3.5          1.4         0.2    setosa              1
#>   2:          4.9         3.0          1.4         0.2    setosa              1
#>   3:          4.7         3.2          1.3         0.2    setosa              1
#>   4:          4.6         3.1          1.5         0.2    setosa              1
#>   5:          5.0         3.6          1.4         0.2    setosa              1
#>  ---                                                                           
#> 146:          6.7         3.0          5.2         2.3 virginica              0
#> 147:          6.3         2.5          5.0         1.9 virginica              0
#> 148:          6.5         3.0          5.2         2.0 virginica              0
#> 149:          6.2         3.4          5.4         2.3 virginica              0
#> 150:          5.9         3.0          5.1         1.8 virginica              0
#> 2 variables not shown: [Species_versicolor <num>, Species_virginica <num>]
iris %>% dummy_dt(Species,longname = FALSE)
#>      Sepal.Length Sepal.Width Petal.Length Petal.Width   Species setosa
#>             <num>       <num>        <num>       <num>    <fctr>  <num>
#>   1:          5.1         3.5          1.4         0.2    setosa      1
#>   2:          4.9         3.0          1.4         0.2    setosa      1
#>   3:          4.7         3.2          1.3         0.2    setosa      1
#>   4:          4.6         3.1          1.5         0.2    setosa      1
#>   5:          5.0         3.6          1.4         0.2    setosa      1
#>  ---                                                                   
#> 146:          6.7         3.0          5.2         2.3 virginica      0
#> 147:          6.3         2.5          5.0         1.9 virginica      0
#> 148:          6.5         3.0          5.2         2.0 virginica      0
#> 149:          6.2         3.4          5.4         2.3 virginica      0
#> 150:          5.9         3.0          5.1         1.8 virginica      0
#> 2 variables not shown: [versicolor <num>, virginica <num>]

mtcars %>% head() %>% dummy_dt(vs,am)
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear
#>    <num> <num> <num> <num> <num> <num> <num> <num> <num> <num>
#> 1:  21.0     6   160   110  3.90 2.620 16.46     0     1     4
#> 2:  21.0     6   160   110  3.90 2.875 17.02     0     1     4
#> 3:  22.8     4   108    93  3.85 2.320 18.61     1     1     4
#> 4:  21.4     6   258   110  3.08 3.215 19.44     1     0     3
#> 5:  18.7     8   360   175  3.15 3.440 17.02     0     0     3
#> 6:  18.1     6   225   105  2.76 3.460 20.22     1     0     3
#> 5 variables not shown: [carb <num>, vs_0 <num>, vs_1 <num>, am_1 <num>, am_0 <num>]
mtcars %>% head() %>% dummy_dt("cyl|gear")
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear
#>    <num> <num> <num> <num> <num> <num> <num> <num> <num> <num>
#> 1:  21.0     6   160   110  3.90 2.620 16.46     0     1     4
#> 2:  21.0     6   160   110  3.90 2.875 17.02     0     1     4
#> 3:  22.8     4   108    93  3.85 2.320 18.61     1     1     4
#> 4:  21.4     6   258   110  3.08 3.215 19.44     1     0     3
#> 5:  18.7     8   360   175  3.15 3.440 17.02     0     0     3
#> 6:  18.1     6   225   105  2.76 3.460 20.22     1     0     3
#> 6 variables not shown: [carb <num>, cyl_6 <num>, cyl_4 <num>, cyl_8 <num>, gear_4 <num>, gear_3 <num>]

# when there are NAs in the column
df <- data.table(x = c("a", "b", NA, NA),y = 1:4)
df %>%
  dummy_dt(x)
#>         x     y   x_a   x_b  x_NA
#>    <char> <int> <num> <num> <num>
#> 1:      a     1     1     0     0
#> 2:      b     2     0     1     0
#> 3:     NA     3     0     0     1
#> 4:     NA     4     0     0     1

# when NA  and "NA" both exist, they would be merged
df <- data.table(x = c("a", "b", NA, "NA"),y = 1:4)
df %>%
  dummy_dt(x)
#>         x     y   x_a   x_b  x_NA
#>    <char> <int> <num> <num> <num>
#> 1:      a     1     1     0     0
#> 2:      b     2     0     1     0
#> 3:     NA     3     0     0     1
#> 4:     NA     4     0     0     1