I am not sure how to word this properly, so I will try to explain this with a replicable example.
I have thousands of entries in a pandas.DataFrame object. I want to export each row as its own json file with a few keys that are not explicitly available in the data frame's structure.
My data frame, df, looks as follows:
> df = pd.DataFrame({'ID':['0','1','2'],
'NAME':['jimmy','james','joben'],
'LAST':['neutron','baxter','segel'],
'ADDRESS':['101 ocean avenue','202 bubble gum county','303 china town'],
'COUNTY':['yellow card park','candy kingdom','universal studio']})
| ID | NAME | LAST | ADDRESS | COUNTY |
|---|---|---|---|---|
| 0 | jimmy | neutron | 101 ocean avenue | yellow card park |
| 1 | james | baxter | 202 bubble gum county | candy kingdom |
| 2 | joben | segel | 303 china town | universal studio |
I want to transform each row to json files with the following structure:
- 'ID'
- 'PERSONAL', which encloses
NAMEandLAST - 'GEO', which encloses
ADDRESSandCOUNTY
In other words, I would like the first transformed row to look as follows:
{
'ID':'0',
'PERSONAL':
{
'NAME':'jimmy',
'LAST':'neutron'
},
'GEO':
{
'ADDRESS':'101 ocean avenue',
'COUNTY':'yellow card park'
}
}
I was thinking of converting df into a pandas.MultiIndex structure to add the PERSONAL and GEO indices and then export each row with pandas.DataFrame.to_json. However, I'm not sure if this will work and wanted to ask around prior to trying this.