note: this question is indeed a duplicate of Split pandas dataframe string entry to separate rows, but the answer provided here is more generic and informative, so with all respect due, I chose not to delete the thread
I have a 'dataset' with the following format:
     id | value | ...
--------|-------|------
      a | 156   | ...
    b,c | 457   | ...
e,g,f,h | 346   | ...
    ... | ...   | ...
and I would like to normalize it by duplicating all values for each ids:
     id | value | ...
--------|-------|------
      a | 156   | ...
      b | 457   | ...
      c | 457   | ...
      e | 346   | ...
      g | 346   | ...
      f | 346   | ...
      h | 346   | ...
    ... | ...   | ...
What I'm doing is applying the split-apply-combine principle of pandas using .groupby that creates a tuple for each group (groupby value, pd.DataFrame())
I created a column to group by that simply counts the ids in the row:
df['count_ids'] = df['id'].str.split(',').apply(lambda x: len(x))
     id | value | count_ids
--------|-------|------
      a | 156   | 1
    b,c | 457   | 2
e,g,f,h | 346   | 4
    ... | ...   | ...
The way I'm duplicating the rows is as follows:
pd.DataFrame().append([group]*count_ids)
I'm slowly progressing, but it is really complex, and I would appreciate any best practice or recommendation you can share with this type of problems.
 
     
    