I have a pandas DF that looks like below and I'm trying to transpose counts of values in different rows to the format below :
   COL1    COL2    COL3
    C1      None    None
    C1      C2      None
    C1      C1      None
    C1      C2      C3
to
    C1  C2  C3
     1   0   0
     1   1   0
     2   0   0
     1   1   1
I found this answer here (Count occurrences of items in Series in each row of a DataFrame)
This does what I need but it's extremely slow on my dataset (4 Million Rows). I tried to do it with 400,000 rows and it still isn't running at an acceptable speed.
I believe it's the list doing a lookup through the massive dataset and with apply being a essentially a for loop. It's massive loops that's slowing the process. The potential solutions here essentially mention not to use Apply which would make the execution here tedious given that I have C1,C2...C100 in my list and 10 vertical columns to validate counts against. Any tips on what I might be able to try here to improve performance?
Here's the code below :
df_store = df.apply(pd.Series.value_counts, axis=1)[list_lookup].fillna(0)
 
     
    