I have concat multiple data sources and each data source has it's own unique columns based on the data source. I want to merge rows who match on a list of columns and keep all unique data sources in the same row.
Example:
df1 = pd.DataFrame({'SharedData': ['A', 'B', 'C', 'D', 'E'],
                    'df1Data': ['1', '2', '3', '4', '5']})
df2 = pd.DataFrame({'SharedData': ['D', 'E', 'F', 'G', 'H'],
                    'df2Data': ['4', '5', '6', '7', '8']})
newdf = pd.concat([df1,df2], axis=0, ignore_index=True)
I need the resulting data set to go from before and after below.
Before Data Set:
| SharedData | df1Data | df2Data | 
|---|---|---|
| A | 1 | |
| B | 2 | |
| C | 3 | |
| D | 4 | |
| E | 5 | |
| D | 4 | |
| E | 5 | |
| F | 6 | |
| G | 7 | |
| H | 8 | 
After Data Set:
| SharedData | df1Data | df2Data | 
|---|---|---|
| A | 1 | |
| B | 2 | |
| C | 3 | |
| D | 4 | 4 | 
| E | 5 | 5 | 
| F | 6 | |
| G | 7 | |
| H | 8 | 
I need to deduplicate rows where SharedData matches, with a new row that contains all of the df specific data.
 
    