I have a nested dictionary annot_dict with structure:
- key = long unique string
- value = list of dictionaries
The values, the list of dictionaries, each have structure:
- key = long unique string (a subcategory of the upper dictionary's key)
- value = list of five string items
An example of the entire structure is:
annot_dict['ID_string'] = [
{'ID_string': ['attr1a', 'attr1b', 'attr1c', 'attr1d', 'attr1e']},
{'string2' : ['attr2a', 'attr2b', 'attr2c', 'attr2d', 'attr2e']},
{'string3' : ['attr3a', 'attr3b', 'attr3c', 'attr3d', 'attr3e']},
]
The ID_string is the same as the first sub-dictionary key. This is the output of a gff3 file parser function I wrote and the real dictionary information is the genes (ID_string) and transcripts (string2, string3,...) from the genome of human chromosome 9, if anyone is familiar with the structure of that file type. The attribute lists describe biotype, start index, end index, strand, and description.
I want to put this information into a pandas DataFrame now. I want to loop through the outermost keys (the ID_strings) in the dict to make one big DataFrame containing a row for each ID_string and rows for each of its subcategories underneath it (string2, string3).
I want it to look like this:
| subunit_ID | gene_ID | start_index | end_index | strand |biotype | desc |
|------------|-----------|-------------|-----------|--------|--------|--------|
|'ID_string' |'ID_string'| 'attr1a' | 'attr1b' |'attr1c'|'attr1d'|'attr1e'|
| 'string2' |'ID_string'| 'attr2a' | 'attr2b' |'attr2c'|'attr2d'|'attr2e'|
| 'string3' |'ID_string'| 'attr3a' | 'attr3b' |'attr3c'|'attr3d'|'attr3e'|
I did look at other answers but none had quite the same dict structure as I do. This is my first question on SO so please feel free to improve the understandability of my question.