I have a question that concerns the use of the pivot function in pandas. I have a table (df_init) with a bunch of customer Ids (7000 different Ids) and the product codes they purchased
| CST_ID | PROD_CODE | 
|---|---|
| 11111 | 1234 | 
| 11111 | 2345 | 
| 11111 | 5425 | 
| 11111 | 9875 | 
| 22222 | 2345 | 
| 22222 | 9251 | 
| 22222 | 1234 | 
| 33333 | 6542 | 
| 33333 | 7498 | 
Each Id can be repeated at most 4 time in the table, but can appear less than 4 times (e,g, 22222 and 33333). I want to reorganize that table as follows (df_fin)
| CST_ID | PROD_1 | PROD_2 | PROD_3 | PROD_4 | 
|---|---|---|---|---|
| 11111 | 1234 | 2345 | 5425 | 9875 | 
| 22222 | 2345 | 9251 | 1234 | NaN | 
| 33333 | 6542 | 7498 | NaN | NaN | 
Good news is, I have found a way to do so. Bad news I am not satisfied as it loops over the Customer Ids nd takes a while. Namely I count the occurrences of a certain Id while looping over the column and add that to a list, then append this list as a new variable to df_init
to_append = []
for index in range(len(df_init)):
    temp = df_init.iloc[:index+1]['CST_ID'] == df_init.iloc[index]['CST_ID'] # ['CST_ID']== df_init.iloc[index]['CST_ID']]
    counter = sum(list(temp))
    to_append.append(counter)
df_init['Product_number'] = to_append
Afterwards I pivot and rename the columns
df_fin = df_init.pivot(index='CST_ID', columns='Product_number', values='PROD_CODE').rename_axis(None).reset_index()
df_fin.columns=['CST_ID', 'pdt1', 'pdt2', 'pdt3', 'pdt4']
Of course this solution works just fine, but looping in order to create the column which I use for the columns specification of the Pivot takes time. Hence I was wondering if there was a better solution (perhapes embedded already in Pandas or in the Pivot method) to do so.
Thanks to anyone who is willing to participate
Best
 
     
    