fread
from data.table could not recognize the encoding
and return the correct form, this could be unconvenient for text mining tasks. The
utf8-encoding
could use "UTF-8" as the encoding to override the current
encoding of characters in a data frame.
utf8_encoding(.data, .cols)
.data | A data.frame. |
---|---|
.cols | The columns you want to convert, usually a character column. |
A data.table with characters in UTF-8 encoding
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> <num> <num> <num> <num> <char> #> 1: 5.1 3.5 1.4 0.2 setosa #> 2: 4.9 3.0 1.4 0.2 setosa #> 3: 4.7 3.2 1.3 0.2 setosa #> 4: 4.6 3.1 1.5 0.2 setosa #> 5: 5.0 3.6 1.4 0.2 setosa #> --- #> 146: 6.7 3.0 5.2 2.3 virginica #> 147: 6.3 2.5 5.0 1.9 virginica #> 148: 6.5 3.0 5.2 2.0 virginica #> 149: 6.2 3.4 5.4 2.3 virginica #> 150: 5.9 3.0 5.1 1.8 virginica