I have a dataframe with duplicate identifier, however the data attributes are different. I want to remove the duplicate rows by combining their data into new columns.
Sample Data:
| id | type | subtype | value | 
|---|---|---|---|
| 111 | a | sub1 | 100 | 
| 111 | b | sub2 | 200 | 
| 112 | c | sub2 | 100 | 
| 113 | a | sub3 | 100 | 
| 114 | b | sub1 | 300 | 
| 114 | c | sub1 | 100 | 
import pandas as pd
data = {'id':['111', '111', '112', '113', '114','114'],
        'type':['a', 'b', 'c', 'a', 'b', 'c'],
        'subtype':['sub1', 'sub2', 'sub2', 'sub3', 'sub1', 'sub1'],
        'value':[100, 200, 100, 100, 300, 100]}
df = pd.DataFrame(data)
df
Desired output would be like this, where rows with duplicate identifiers are combined through adding to new columns:
| id | type | subtype | value | type1 | subtype1 | value1 | 
|---|---|---|---|---|---|---|
| 111 | a | sub1 | 100 | b | sub2 | 200 | 
| 112 | c | sub2 | 100 | null | null | null | 
| 113 | a | sub3 | 100 | null | null | null | 
| 114 | b | sub1 | 300 | c | sub1 | 100 | 
import pandas as pd
output = {'id':['111', '112', '113', '114'],
        'type':['a',  'c', 'a', 'b', ],
        'subtype':['sub1', 'sub2', 'sub3', 'sub1'],
        'value':[100, 100, 100, 300],
        'type1':['b', 'null', 'null', 'c'],
        'subtype1':['sub2', 'null', 'null', 'sub1'],
        'value1':[ 200, 'null', 'null', 100]}
df1 = pd.DataFrame(output)
df1
Note, in the real data, duplicate rows could be more than 2 for each duplicate identifier.
Please help me out if you can, much appreciated!
 
     
    