I have a dataframe with 2.7 million rows as you see below-
df
Out[10]: 
         ClaimId  ServiceSubCodeKey  ClaimRowNumber  SscRowNumber
0        1902659                183               1             1
1        1902659               2088               1             2
2        1902663               3274               2             1
3        1902674                 12               3             1
4        1902674                 23               3             2
         ...                ...             ...           ...
2793010  2563847               3109          603037             4
2793011  2563883               3109          603038             1
2793012  2564007               3626          603039             1
2793013  2564007               3628          603039             2
2793014  2564363               3109          603040             1
[2793015 rows x 4 columns]
I am trying to Hot Encode this in python below but I end up with a Memory error:
import pandas as pd
columns = (
    pd.get_dummies(df["ServiceSubCodeKey"])
    .reindex(range(df.ServiceSubCodeKey.min(),
        df.ServiceSubCodeKey.max()+1), axis=1, fill_value=0)
    # now it has all digits
    .astype(str)
    )
# this will create codes
codes_values = [int(''.join(r)) for r in columns.itertuples(index=False)]
codes = pd.Series({'test': codes_values}).explode()
codes.index = df.index
# groupby and aggregate the values into lists
dfg = codes.groupby(df.ClaimId).agg(list).reset_index()
# sum the lists; doing this with a pandas function also does not work, so no .sum or .apply
summed_lists = list()
for r, v in dfg.iterrows():
    summed_lists.append(str(sum(v[0])))
# assign the list of strings to a column
dfg['sums'] = summed_lists
# perform the remainder of the functions on the sums column
dfg['final'] = dfg.sums.str.pad(width=columns.shape[1], fillchar='0').str.rstrip('0')
# merge df and dfg.final
dfm = pd.merge(df, dfg[['ClaimId', 'final']], on='ClaimId')
dfm
  File "pandas/_libs/lib.pyx", line 574, in pandas._libs.lib.astype_str
MemoryError
How can I do this in automated batches so it doesnt give me a memory error?
