I have data in parquet format which is too big to fit into memory (6 GB). I am looking for a way to read and process the file using Python 3.6. Is there a way to stream the file, down-sample, and save to a dataframe? Ultimately, I would like to have the data in dataframe format to work with.
Am I wrong to attempt to do this without using a spark framework?
I have tried using pyarrow and fastparquet but I get memory errors on trying to read the entire file in.
Any tips or suggestions would be greatly appreciated!