Use UTF-8 for character encoding in a data frame

fread from data.table could not recognize the encoding and return the correct form, this could be unconvenient for text mining tasks. The utf8-encoding could use "UTF-8" as the encoding to override the current encoding of characters in a data frame.

utf8_encoding(.data, .cols)

Arguments

.data: A data.frame.
.cols: The columns you want to convert, usually a character column.

Value

A data.table with characters in UTF-8 encoding

Examples

iris %>%
  as.data.table() %>%
  utf8_encoding(Species)  # could also use `is.factor`
#>      Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#>             <num>       <num>        <num>       <num>    <char>
#>   1:          5.1         3.5          1.4         0.2    setosa
#>   2:          4.9         3.0          1.4         0.2    setosa
#>   3:          4.7         3.2          1.3         0.2    setosa
#>   4:          4.6         3.1          1.5         0.2    setosa
#>   5:          5.0         3.6          1.4         0.2    setosa
#>  ---                                                            
#> 146:          6.7         3.0          5.2         2.3 virginica
#> 147:          6.3         2.5          5.0         1.9 virginica
#> 148:          6.5         3.0          5.2         2.0 virginica
#> 149:          6.2         3.4          5.4         2.3 virginica
#> 150:          5.9         3.0          5.1         1.8 virginica