I'm working on a project involving Twitter data. I have several hundreds thousand tweets downloaded and stored in files. The data was returned in json format, and the stream consumer I was using converted them to python dictionaries, so I it's all stored in text files, one tweet per line, as python dictionaries.
There is a lot of extraneous information, so I have a python script that reads each line in as a dict and extracts some useful information. What would be the best way to store this data now that's it's extracted? I was printing it back out to csv files, but I've been having some issues with that and have come across some people who seem to feel that that's not the best way to store it.
What would be the most effective way of storing this data? I will need to access it to find patterns, match similar items, etc. I was thinking of using a database - is that the best option? Are there others that are better?