For this example i have 2 dataframes, the genre column in df1 is column 3 but in df2 it is column 2, also the header is slightly different. in my actual script i have to search the column names because the column location varies in each sheet it reads.
how do i recognise different header names as the same thing?
df1 = pd.DataFrame({'TITLE': ['The Matrix','Die Hard','Kill Bill'],
               'VENDOR ID': ['1234','4321','4132'],
               'GENRE(S)': ['Action', 'Adventure', 'Drama']})
df2 = pd.DataFrame({'TITLE': ['Toy Story','Shrek','Frozen'],
               'Genre': ['Animation', 'Adventure', 'Family'],
               'VENDOR ID': ['5678','8765','8576']})
column_names = ['TITLE','VENDOR ID','GENRE(S)']
appended_data = []
sheet1 = df1[df1.columns.intersection(column_names)]
appended_data.append(sheet1)
sheet2 = df2[df2.columns.intersection(column_names)]
appended_data.append(sheet2)
appended_data = pd.concat(appended_data, sort=False)
output:
        TITLE VENDOR ID   GENRE(S)
0  The Matrix      1234     Action
1    Die Hard      4321  Adventure
2   Kill Bill      4132      Drama
0   Toy Story      5678        NaN
1       Shrek      8765        NaN
2      Frozen      8576        NaN
desired output:
        TITLE VENDOR ID   GENRE(S)
0  The Matrix      1234     Action
1    Die Hard      4321  Adventure
2   Kill Bill      4132      Drama
0   Toy Story      5678  Animation
1       Shrek      8765  Adventure
2      Frozen      8576     Family
 
    