fread from data.table could not recognize the encoding and return the correct form, this could be unconvenient for text mining tasks. The utf8-encoding could use "UTF-8" as the encoding to override the current encoding of characters in a data frame.

utf8_encoding(.data, .cols)

Arguments

.data

A data.frame.

.cols

The columns you want to convert, usually a character column.

Value

A data.table with characters in UTF-8 encoding

Examples

iris %>% as.data.table() %>% utf8_encoding(Species) # could also use `is.factor`
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> <num> <num> <num> <num> <char> #> 1: 5.1 3.5 1.4 0.2 setosa #> 2: 4.9 3.0 1.4 0.2 setosa #> 3: 4.7 3.2 1.3 0.2 setosa #> 4: 4.6 3.1 1.5 0.2 setosa #> 5: 5.0 3.6 1.4 0.2 setosa #> --- #> 146: 6.7 3.0 5.2 2.3 virginica #> 147: 6.3 2.5 5.0 1.9 virginica #> 148: 6.5 3.0 5.2 2.0 virginica #> 149: 6.2 3.4 5.4 2.3 virginica #> 150: 5.9 3.0 5.1 1.8 virginica