How to store processed data?

Question

I'm working on a project involving Twitter data. I have several hundreds thousand tweets downloaded and stored in files. The data was returned in json format, and the stream consumer I was using converted them to python dictionaries, so I it's all stored in text files, one tweet per line, as python dictionaries.

There is a lot of extraneous information, so I have a python script that reads each line in as a dict and extracts some useful information. What would be the best way to store this data now that's it's extracted? I was printing it back out to csv files, but I've been having some issues with that and have come across some people who seem to feel that that's not the best way to store it.

What would be the most effective way of storing this data? I will need to access it to find patterns, match similar items, etc. I was thinking of using a database - is that the best option? Are there others that are better?

Journeyman Geek · Accepted Answer · 2013-03-07T13:57:10.367

If its just key pair stores, apparently nosql style databases work well - twitter does use these, and they might be a great fit if you need to handle a lot of data with very little structure. You could probably use a traditional rdbms, or maybe an embedded sqlite db if there are more than a simple key pair store and had structured data with relations.

It might also help to understand the weakness of a flat file store (no transaction logging or structure) , nosql (no ACID) and a traditional db(bulky, less scalable but well understood and often reasonably fast). With a small non updated set of data, any of them should work

How to store processed data?

1 Answers1