Build a new DataFrame from an existing one with a column containing a list (populate new lines using a list)

Question

I have a DataFrame like this:

df = pd.DataFrame({'name': ['toto', 'tata', 'tati'], 'choices': 0})
df['choices'] = df['choices'].astype(object)
df['choices'][0] = [1,2,3]
df['choices'][1] = [5,4,3,1]
df['choices'][2] = [6,3,2,1,5,4]

print(df)

             choices  name
0           [1, 2, 3]  toto
1        [5, 4, 3, 1]  tata
2  [6, 3, 2, 1, 5, 4]  tati

I'd like to build a DataFrame based on df like this

             choice  rank  name
0                 1     0  toto
1                 2     1  toto
2                 3     2  toto
3                 5     0  tata
4                 4     1  tata
5                 3     2  tata
6                 1     3  tata
7                 6     0  tati
8                 3     1  tati
9                 2     2  tati
10                1     3  tati
11                5     4  tati
12                4     5  tati

I want to populate new lines using a list and index of each value.

I did this

size = df['choices'].map(len).sum()
df2 = pd.DataFrame(index=range(size), columns=df.columns)
del df2['choices']
df2['choice'] = np.nan
df2['rank'] = np.nan

k = 0
for i in df.index:
    choices = df['choices'][i]
    for rank, choice in enumerate(choices):
        df2['name'][k] = df['name'][i]
        df2['choice'][k] = choice
        df2['rank'][k] = rank
        k += 1

But I would prefer a vectorized solution. Is it possible with Python/Pandas ?

score 5 · Accepted Answer · edited May 23 '17 at 11:50

In [4]: s = df.choices.apply(Series).stack()

In [5]: s.name = 'choices' # needs a name to join

In[6]: del df['choices']

In[7]: df1 = df.join(s.reset_index(level=1))

In[8]: df1.columns = ['name', 'rank', 'choice']

In [9]: df1.sort(['name', 'rank']).reset_index(drop=True)
Out[9]: 
    name  rank  choice
0   tata     0       5
1   tata     1       4
2   tata     2       3
3   tata     3       1
4   tati     0       6
5   tati     1       3
6   tati     2       2
7   tati     3       1
8   tati     4       5
9   tati     5       4
10  toto     0       1
11  toto     1       2
12  toto     2       3

This is related to this solution of mine, but in your case you're using the index (rank) instead of dropping it.

Build a new DataFrame from an existing one with a column containing a list (populate new lines using a list)

1 Answers1

Linked