5

I am currently using python with matplotlib to display a 440+ k lines .csv but it takes 11 sec to display only one column. My .csv has always the same format. Is there a way to parse it quicker ? I choose to store all the column into lists then display it.

Here is the code I made :

csv_path = "C:/Users/mydata.csv"
csv_database = open(csv_path, delimiters=";")    
data_dict = csv.DictReader(csv_database, delimiter=";")

current_row = 0

number_list = []

for row in data_dict:

   current_row += 1 # Skip heading row

   if current_row == 1:
       continue

   # Here I add to a list of strings already created 
   name_list.append(row["Name"]) # Assuming the header of the column is "Name"

   # Here I add to a list of integer 
   if row['Number'] == 'NULL':
        int_list.append(0)

   elif row['Number'] != " ":

        int_list.append(int(row['Number'])) # Assuming the header is "Number"

   else:
        int_list.append(0)
Jean
  • 173

1 Answers1

1

Looks ok to me. This will work for your small scale CSVs (sub a couple of thousand rows).

When I had tp parse huge CSV Files (100k rows +) - I used the Cassava module which out performed the native modules by a long way.

Take a look at http://hackage.haskell.org/package/cassava

Hope this helps

Fazer87
  • 12,965