My sample data is as follows:
sample_json = """{
"P1":[
{"Question":"Fruit",
"Choices":["Yes","No"]}
],
"P2":[
{"Question":"Fruit Name",
"Choices":["Mango","Apple","Banana"]}
],
"P3":[
{"Question":"Fruit color",
"Choices":["Yellow","Red"]}
],
"P4":[
{"Question":"Vegetable",
"Choices":["Yes","No"]}
],
"P5":[
{"Question":"Veggie Name",
"Choices":["Tomato","Potato","Carrots"]}
],
"P6":[
{"Question":"Veggie Color",
"Choices":["Red","Yellow","Brown"]}
],
"P7":[
{"Question":"Enjoy Eating?",
"Choices":["Yes","No"]}
]
}"""
I am trying to generate a data frame using pandas as follows:
import json, random
import pandas as pd
sample_data = json.loads(sample_json)
colHeaders = []
for k,v in sample_data.items():
colHeaders.append(v[0]['Question'])
df = pd.DataFrame(columns= colHeaders)
for i in range (10):
Answers = []
for k,v in sample_data.items():
Answers.append(random.choice(v[0]['Choices']))
df.loc[len(df)] = Answers
It creates the df like below
Although random, I want to populate it based on P1 and P4 on the following conditions:
(P.S: P1, P2, P3....P7 in sample_json)
- If
P1.AnswerChoice = No, fillNullto P2 and P3 - If
P4.AnswerChoice = No, fillNullto P5 and P6 - If
P1.AnswerChoice = NoandP4.AnswerChoice = No, fillNullto p7 - Both
P1.AnswerChoiceandP4.AnswerChoicecannot beYes
So that it can produce the following data frame:
| Fruit | Fruit Name | Fruit Color | Vegetable | Veggie Name | Veggie Color | Enjoy eating? |
|---|---|---|---|---|---|---|
| No | Null | Null | Yes | Carrots | Yellow | No |
| No | Null | Null | No | Null | Null | Null |
| Yes | Apple | Yellow | No | Null | Null | Yes |
| Yes | Banana | Yellow | No | Null | Null | No |
| No | Null | Null | Yes | Potato | Yellow | No |
| No | Null | Null | Yes | Tomato | Yellow | Yes |
| Yes | Mango | Red | No | Null | Null | Null |
| No | Null | Null | Yes | Carrots | Yellow | No |
| Yes | Apple | Yellow | No | Null | Null | No |
Edit:
I would want to handle this with the for loop that iterates over the json to prepare the row for the data frame instead of editing the data frame.
For example in the following part of the code if it is possible:
for i in range (10):
Answers = []
for k,v in sample_data.items():
Answers.append(random.choice(v[0]['Choices']))
df.loc[len(df)] = Answers
