Suppose we have a scenario with multiple customers and each customer can purchase different products of different quantities. This is illustrated in the dataframe below - with multiple ids and each id may have different column values as shown below:
   id   item    No_Units     No_Purchases
    1      140         46            1
    1      160         16            1
    2      160         46            2
    3      140         16            1
From this we want to achieve a single row, so that we can view each customer's purchases on a single row. Updated dataframe should based on the following conditions:
- Each item in the items' column should have its own column similar to converting it into dummy variables. Therefore, item_140, item_160 etc. The content in item item_46 would be the item number 46. If the customer doesn't the item, it should be assigned a zero.
- No_Units and No_Purchases should split in relation to the associated item column - eg: No_Units_item_140,No_Units_item_160, No_Purchases_item_140, No_Purchases_item_160.
NB: there are multiple columns in addition to the columns shown above. Output Dataframe:
   id item_140   item_160   No_Units_item_140     No_Purchases_140  No_Units_160    No_Purchases_160
    1     140        160            46                    1             16                 2
    2     0          160            0                     0             46                 2           
    3     140        0              16                    1             0                  0
The first part of the code is to create the dummy variables:
df = pd.get_dummies(df, columns=['item'],drop_first=True, prefix=['item'])
I have attempted to code the solution, however there are issue with speed,labels, merging and inserting original values into the dummy dataframe:
 output = pd.DataFrame()
 for customer in train_data['id'].unique():
     df_user = df[df['id']==customer]
     t = pd.DataFrame()
     for feat in ['No_Units','No_Purchases']:
         t = pd.DataFrame([df_user['item'],df_user[feat]]).T
         t  = pd.concat([t,t1],axis=1)
         t=t.stack().to_frame().T
        ## t.columns = t.iloc[0]
         t.columns = ['{}_{}'.format(*c) for c in t.columns]
     t['id'] = df_user['id']
     output = pd.concat([output,t],ignore_index=True)
I do know that dict() will speed this up considerably aswell.
 
    