Quickly create dummy (binary) columns from character and factor type columns in the inputted data (and numeric columns if specified.) This function is useful for statistical analysis when you want binary columns rather than character columns.
dummy_dt(.data, ..., longname = TRUE)
data.table
If no columns provided, will return the original data frame. When NA exist in the input column, they would also be considered. If the input character column contains both NA and string "NA", they would be merged.
This function is inspired by fastDummies package, but provides
simple and precise usage, whereas fastDummies::dummy_cols
provides more
features for statistical usage.
https://stackoverflow.com/questions/18881073/creating-dummy-variables-in-r-data-table
dummy_cols
iris %>% dummy_dt(Species)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <num> <num> <num> <num> <fctr>
#> 1: 5.1 3.5 1.4 0.2 setosa
#> 2: 4.9 3.0 1.4 0.2 setosa
#> 3: 4.7 3.2 1.3 0.2 setosa
#> 4: 4.6 3.1 1.5 0.2 setosa
#> 5: 5.0 3.6 1.4 0.2 setosa
#> ---
#> 146: 6.7 3.0 5.2 2.3 virginica
#> 147: 6.3 2.5 5.0 1.9 virginica
#> 148: 6.5 3.0 5.2 2.0 virginica
#> 149: 6.2 3.4 5.4 2.3 virginica
#> 150: 5.9 3.0 5.1 1.8 virginica
#> 3 variable(s) not shown: [Species_setosa <num>, Species_versicolor <num>, Species_virginica <num>]
iris %>% dummy_dt(Species,longname = FALSE)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species setosa
#> <num> <num> <num> <num> <fctr> <num>
#> 1: 5.1 3.5 1.4 0.2 setosa 1
#> 2: 4.9 3.0 1.4 0.2 setosa 1
#> 3: 4.7 3.2 1.3 0.2 setosa 1
#> 4: 4.6 3.1 1.5 0.2 setosa 1
#> 5: 5.0 3.6 1.4 0.2 setosa 1
#> ---
#> 146: 6.7 3.0 5.2 2.3 virginica 0
#> 147: 6.3 2.5 5.0 1.9 virginica 0
#> 148: 6.5 3.0 5.2 2.0 virginica 0
#> 149: 6.2 3.4 5.4 2.3 virginica 0
#> 150: 5.9 3.0 5.1 1.8 virginica 0
#> 2 variable(s) not shown: [versicolor <num>, virginica <num>]
mtcars %>% head() %>% dummy_dt(vs,am)
#> mpg cyl disp hp drat wt qsec vs am gear
#> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num>
#> 1: 21.0 6 160 110 3.90 2.620 16.46 0 1 4
#> 2: 21.0 6 160 110 3.90 2.875 17.02 0 1 4
#> 3: 22.8 4 108 93 3.85 2.320 18.61 1 1 4
#> 4: 21.4 6 258 110 3.08 3.215 19.44 1 0 3
#> 5: 18.7 8 360 175 3.15 3.440 17.02 0 0 3
#> 6: 18.1 6 225 105 2.76 3.460 20.22 1 0 3
#> 5 variable(s) not shown: [carb <num>, vs_0 <num>, vs_1 <num>, am_1 <num>, am_0 <num>]
mtcars %>% head() %>% dummy_dt("cyl|gear")
#> mpg cyl disp hp drat wt qsec vs am gear
#> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num>
#> 1: 21.0 6 160 110 3.90 2.620 16.46 0 1 4
#> 2: 21.0 6 160 110 3.90 2.875 17.02 0 1 4
#> 3: 22.8 4 108 93 3.85 2.320 18.61 1 1 4
#> 4: 21.4 6 258 110 3.08 3.215 19.44 1 0 3
#> 5: 18.7 8 360 175 3.15 3.440 17.02 0 0 3
#> 6: 18.1 6 225 105 2.76 3.460 20.22 1 0 3
#> 6 variable(s) not shown: [carb <num>, cyl_6 <num>, cyl_4 <num>, cyl_8 <num>, gear_4 <num>, gear_3 <num>]
# when there are NAs in the column
df <- data.table(x = c("a", "b", NA, NA),y = 1:4)
df %>%
dummy_dt(x)
#> x y x_a x_b x_NA
#> <char> <int> <num> <num> <num>
#> 1: a 1 1 0 0
#> 2: b 2 0 1 0
#> 3: NA 3 0 0 1
#> 4: NA 4 0 0 1
# when NA and "NA" both exist, they would be merged
df <- data.table(x = c("a", "b", NA, "NA"),y = 1:4)
df %>%
dummy_dt(x)
#> x y x_a x_b x_NA
#> <char> <int> <num> <num> <num>
#> 1: a 1 1 0 0
#> 2: b 2 0 1 0
#> 3: NA 3 0 0 1
#> 4: NA 4 0 0 1