I'm trying to build a section of a program that iterates through each individual column per row and then outputs resulting logic dependent on the contents of the column of the row the program iterates on.
Dataframe:
import pandas as pd
df = pd.DataFrame({'Key': ['1234', '4321', '2341', '4132'],
'Value1': ['JFK', 'LAX', 'ATL','CLT'],
'Value2': ['NYC', 'CA', 'GA','NC'],
'Value3': ['Yes', 'No', 'No', 'No']})
For each row, I want to look at the specific key and then look at Value1 contents and output based on an if statement, then Value2 and Value3.
The IF statement logic would look similar to:
for rows in df():
if [Value1] == 'JFK':
print('John F Kennedy')
else :
print('N/A')
if [Value2] == 'NYC':
print('New York City')
else:
print('N/A')
if [Value3] == 'Yes':
print('Able')
elif [Value3] == 'No':
print('Unable')
I would want the desired output to be in a dataframe as well that would summarize the initial key iterated along with a concatenation of all the print statements per the above logic. Would look something like this:
result_df = pd.DataFrame({'Key': ['1234'],
'Result': ['John F Kennedy, New York City, Able']})
I've simplified the logic above to condense it but in reality the logic would have way more conditions to meet.
Any overall help would be great. Just need a kick in the right direction to have pandas iterate through the columns of a row.
Thank you!
Resolution:
Given that my question might not have been the clearest, I dug into numpy and pandas a bit more and found a solution for those who may encounter a similar issue to me.
I wanted to find a way of iterating through a dataframe and based on the contents of specific columns, create another column with results. This would essentially mimic an if statement in excel. Several posters had discouraged using iterrows() so I didn't go down that route. Instead i found np.select() in numpy.
Assume the following dataframe:
data = {'ID': ['1254','4568','9547','7856'],
'Primary': [True, False, True, False],
'Secondary': [True, False, False, True]}
df = pd.DataFrame(data)
print(df)
I want the result of Tertiary (new column) to be a function of the contents of Primary and Secondary. For example, I want Tertiary to equal "Yes" when Primary is equal to True and Secondary is equal to True. Instead of iterating, I used np.select():
import pandas as pd
import numpy as np
data = {'ID': ['1254','4568','9547','7856'],
'Primary': [True, False, True, False],
'Secondary': [True, False, False, True]}
df = pd.DataFrame(data)
print(df)
primary_secondary_flag_conditions = [
(df['Primary'] == True) & (df['Secondary'] == True),
(df['Primary'] == False) & (df['Secondary'] == False),
(df['Primary'] == True) & (df['Secondary'] == False),
(df['Primary'] == False) & (df['Secondary'] == True)
]
primary_secondary_flag_values = [
"Yes",
"No",
"Maybe",
"Maybe"
]
df['Tertiary'] = np.select(primary_secondary_flag_conditions, primary_secondary_flag_values, None)
# replacing all 'None' values with empty string ""
df.fillna("",inplace=True)
print(df)
The resulting dataframe appears like so:
import pandas as pd
import numpy as np
data = {'ID': ['1254','4568','9547','7856'],
'Primary': [True, False, True, False],
'Secondary': [True, False, False, True],
'Tertiary': ["Yes", "No", "Maybe", "Maybe"]}
df = pd.DataFrame(data)
print(df)