I have a large dataset in csv format to build a prediction model. Because of its size, I planned to use h2o package in R to build the model. However, the data, in multiple columns of the data.frame, contains some Chinese Simplified characters and h2o is having difficulty receiving the data.
I've tried two different approaches. The first approach involved directly reading from the file using the h2o.importFile() function to import the data. However, this approach ends up converting the Chinese characters into some messy codes.
The second approach I've tried to first bring the data into R using readr and base R read_csv/read.csv functions. After the data is loaded correctly into R, I tried to convert the data.frame into h2o frame using as.h2o function. Though, the end result of this approach also resulted in a messed up translation.
To illustrate, I've written the following piece of codes as an example:
require(h2o)
dat<-data.frame(x=rep(c("北京","上海"),50),
y=rnorm(mean=10,sd=3,n=100))
h2o.init(nthreads=-1)
h2o.dat<-as.h2o(dat)
