my tasks:
- load from the database matrices whose dimension is bigger than my
RAM by using (
pandas.read_sql(...)- database ispostresql) - operate on the
numpyrepresentation of such matrices (bigger than my RAM) usingnumpy
the problem: I get a memory error when even loading the data from the database.
my temporary quick and dirty solution: loop over chunks of the aforementioned data (so importing parts of the data at a time) thus allowing RAM to handle the workload. The issue at play here is speed related. time is significantly higher and before delving into Cython optimization and the like, I wanted to know whether there were some solutions (either in the forms of data structures like using the library shelving or the HDF5 format) to solve the issue