I have a Pandas dataframe which has Encoding: latin-1 and is delimited by ;. The dataframe is very large almost of size: 350000 x 3800. I wanted to use sklearn initially but my dataframe has missing values (NAN values) so i could not use sklearn's random forests or GBM. So i had to use H2O's Distributed random forests for the Training of the dataset. The main Problem is the dataframe is not efficiently converted when i do h2o.H2OFrame(data). I checked for the possibility for providing the Encoding Options but there is nothing in the documentation.
Do anyone have an idea about this? Any leads could help me. I also want to know if there are any other libraries like H2O which can handle NAN values very efficiently? I know that we can impute the columns but i should not do that in my dataset because my columns are values from different sensors, if the values are not there implies that the sensor is not present. I can use only Python