The raw data shape is (200000 * 15) but after pre-processing the data and applying OneHotEncoding the data dimension has got increased to (200000 * 300).
The data needs to be trained with Linear Regression, XGBoost and RF for predictive modeling. Earlier LabelEncoder had been used and the results are not satisfactory.
(200000 * 300) is consuming a whole lot of RAM and slapping MemoryError while training the data.
- Running on
Jupyter NotebookAWS with 16 gb RAM - Using
sklearnfor most of the ML part - Data is in
csvformat (loaded asDataFramein Python)
Would appreciate any suggestion !