I have pandas dataframe of size (607875, 12294). The data is sparse and looks like:
     ID BB CC DD ...
0   abc 0  0  1  ...
1   bcd 0  0  0  ...
2   abc 0  0  1  ...
...
I converted it to the sparse form by calling
dataframe = dataframe.to_sparse()
Later, I groupped it by ID and sum the row values by
dataframe = dataframe.groupby("ID").sum()
For smaller dataframes it works perfectly well, but for this size, it worked for an hour and did not finish the work.
 Is there a way to speed it up or get around it? Is there any other sparse methods I can use because the to_sparse method is deprecated.
The size of output dataframe would be (2000, 12294) and look like (if there is no other 1 in abc column):
     ID BB CC DD ...
0   abc 0  0  2  ...
1   bcd 0  0  0  ...
...
I have 32 GB RAM on my PC, so it should be enough.