I'm reading a csv file which is about 1 GB but it winds up taking over 10 GB of memory. Since DictReader returns an iterator over dicts, each of which has the elements of the header string encoded as keys, I can imagine lines taking up twice as much space (~1 GB) but ten times as much? This confuses me.
import csv
def readeverything(filename):
thefile = open(filename)
reader = csv.DictReader(thefile, delimiter='\t')
lines = []
for datum in reader:
lines.append(datum)
thefile.close()
return lines
The size of the raw string is actually smaller than the size of the parsed dict. I found this out using sys.getsizeof on the first line in the file and on the first record read by csv.DictReader. Therefore, the strict size of the dictionary does not account for the exponential explosion of memory usage when reading the CSV.