I have data as below:
ID  date    Net Total       Class
11  201706  XN  0.607500    P
53  201709  M9  0.989722    V
68  201709  FM  3.736944    P
61  201701  ZK  1.121388    B
17  201705  F   1.969722    V
This is a huge table (0.5 billion records) in PosteGreSQL and I need to pull a subset of it into Python as below:
sql_subset = str("""select * from table_name where date in 
                              (201709,201710,201711,201712) and Class = 'V';""")
df_sub = pd.read_sql(sql_subset,cnxn)
Using read_sql gives memory error. This subsetted query is around 0.58 million records and I am looking at a way to import this into python.
Things I have considered:
- HDF5 and PyTables - Not sure how to pull data from PostgreSQL into HDF5/PyTables using Python.
- Splitting into csv for each month and then combining them into a pandas dataframe - but wouldn't that still give memory error. - Can someone please point me in the right direction for solving this? 
 
    