I have a csv file which is more than 1 GB in size. Usually this is how I convert a csv to a table using pandas.
import pandas as pd
from sqlalchemy import create_engine
file_path = "/path/to/test.csv"
df = pd.read_csv(file_path)
for col in df.columns:
    # convert each column values to numeric 
    df[col] = pd.to_numeric(df[col], errors='coerce')
engine = create_engine('postgresql://username:password@localhost:port/dbname')
df.to_sql("mytable", engine)
As you can see, I load up the csv file, perform operation on each column and dump it to a table.
Now since my file is very large, pandas is unable to load it in data frame. So I looked up a solution online which talked about loading data in chunks and performing operations Using pandas to efficiently read in a large CSV file without crashing. So this is what I came up with
file_path = "/path/to/test.csv" 
chunksize = 100000
for chunk in pd.read_csv(file_path, chunksize=chunksize, iterator=True, low_memory=False):
    columns = chunk.columns
It gives me the columns of every chunk. So does a chunk size means it reads n rows at a time? I can't quite figure out how do I ensure I cover all the chunks and keep appending the data to the table such that in the end, I have all the data dumped to the table like I do with smaller csv files?
 
    