The problem is to save a dictionary for data analysis so that it will scale. I am performing 10000 search and based on the results I am saving a dictionary for every query. Finally, I end up with a dictionary like the following:
{
'query_1' : {'has_result': True (or False),
             'direct_result': True (or False),
             'title': "title_1",
             'summary': "summary_1",
             'infobox': {'header_11': "data_11",
                         'header_12': "data_12",
                          .
                          .
                          .
              }
'query_2' : {'has_result': True (or False),
             'direct_result': True (or False),
             'title': "title_2",
             'summary': "summary_2",
             'infobox': {'header_21': "data_21",
                         'header_22': "data_22",
                          .
                          .
                          .
              }
.
.
.
}
The problematic part is obviously 'infobox'. I have no idea how many key-value pair I will get for each 'infobox' (usually not more than 50). And the keys are expected to be different for each infobox.
Right now, I can only think of the following way to save the data as a csv.
+---------+------------+---------------+---------+-----------+----------------+--------------+
|  query  | has_result | direct_result |  title  |  summary  | infobox_header | infobox_data |
+---------+------------+---------------+---------+-----------+----------------+--------------+
| query_1 | TRUE       | TRUE          | title_1 | summary_1 | header_1       | data_1       |
| query_1 | TRUE       | TRUE          | title_1 | summary_1 | header_2       | data_2       |
| query_1 | TRUE       | TRUE          | title_1 | summary_1 | header_3       | data_3       |
| query_1 | TRUE       | TRUE          | title_1 | summary_1 | header_4       | data_4       |
| query_1 | TRUE       | TRUE          | title_1 | summary_1 | header_5       | data_5       |
| query_2 | TRUE       | FALSE         | title_2 | summary_2 | header_1       | data_1       |
| query_2 | TRUE       | FALSE         | title_2 | summary_2 | header_2       | data_2       |
| query_2 | TRUE       | FALSE         | title_2 | summary_2 | header_3       | data_3       |
| query_2 | TRUE       | FALSE         | title_2 | summary_2 | header_4       | data_4       |
+---------+------------+---------------+---------+-----------+----------------+--------------+
The problem with my solution is, 'title' and 'summary' is a string variable. For 10000 queries, this is not a big deal. I end up with roughly 200,000 rows. But I am just thinking whether theoretically, this is the best way to save this dictionary for data analysis purpose.
What if in the future I use 100,000 or 1,000,000 queries? How will you go about solving this problem? Will you use a different data structure from the beginning? and how will you make it ready for data analysis?
 
    