I have a pandas dataframe that looks like:
word | ss |
ABC  | E  |
DHS  | H  |
ABC  | H  |
ABC  | C  |
ABC  | H  |
SJA  | C  |
SJA  | H  |
But I'm trying to making something like:
word | H | E | C
ABC  | 2 | 1 | 1
DHS  | 1 | 0 | 0
SJA  | 1 | 0 | 1
What I've done so far:
helix = []
sheet = []
coil = []
for ss in big_ss_list: # this is the ss column in a list form.
    counter = counter + 1
    print ('position', counter, 'of', len(big_ss_list))
    if ss == 'H':
        sheet = sheet + [0]
        helix = helix + [1]
        coil = coil + [0]
    elif ss == 'E':
        sheet = sheet + [1]
        helix = helix + [0]
        coil = coil + [0]
    elif ss == 'C':
        sheet = sheet + [0]
        helix = helix + [0]
        coil = coil + [1]
fdf = pd.DataFrame(list(zip(big_gram_list, big_ss_list, helix, sheet, coil)), columns =['word', 'secondary_structure', 'helix', 'sheet', 'coil'])
fdf = fdf.iloc[(fdf.word).argsort()] # sort based on alphabetically order
fdf = fdf.groupby('word',as_index=False).sum()
But this process is long because there are ~37mil rows and takes significant time to run.
Is there any quicker way?
