How to convert column with list of values into rows in Pandas DataFrame

Question

Hi I have a dataframe like this:

    A             B 
0:  some value    [[L1, L2]]

I want to change it into:

    A             B 
0:  some value    L1
1:  some value    L2

How can I do that?

Pygirl · Answer 1 · 2019-11-08T09:02:32.403

79

Pandas >= 0.25

df1 = pd.DataFrame({'A':['a','b'],
               'B':[[['1', '2']],[['3', '4', '5']]]})
print(df1)

    A   B
0   a   [[1, 2]]
1   b   [[3, 4, 5]]

df1 = df1.explode('B')
df1.explode('B')

I don't know how good this approach is but it works when you have a list of items.

edited Nov 08 '19 at 09:02

answered Aug 13 '19 at 06:38

Pygirl

12,969
5
30
43

3

Perfect! I recalled vaguely that there's a function to perform this in a single step but couldn't quite remember the name and wasn't able to find it again in the documentation. Almost gave in to go with the function chaining solution until I found this :) – kerwei Oct 21 '19 at 07:14
2

Better than all the other provided solutions – Shiv Krishna Jaiswal Jan 31 '20 at 03:49
1

might want check this issue before using it. (may be wait for 0.26 release) https://github.com/pandas-dev/pandas/issues/30748 – NoCompliance Feb 25 '20 at 02:22
1

Rule of thumb: if I can explain in few words, it should take few steps. This answer seems better than the accepted one. – Victor Ribeiro Dec 30 '20 at 16:30
1

`df1.explode('B')` does the job. Thanks! – Tushar Jun 04 '22 at 01:11
1

and you can add `.reset_index(drop=True)` end of the line to remove the same index values. So; `df1.explode('B').reset_index(drop=True)` will be the answer – msklc Oct 05 '22 at 22:15

MaxU - stand with Ukraine · Accepted Answer · 2017-11-29T13:17:38.760

38

you can do it this way:

In [84]: df
Out[84]:
               A               B
0     some value      [[L1, L2]]
1  another value  [[L3, L4, L5]]

In [85]: (df['B'].apply(lambda x: pd.Series(x[0]))
   ....:         .stack()
   ....:         .reset_index(level=1, drop=True)
   ....:         .to_frame('B')
   ....:         .join(df[['A']], how='left')
   ....: )
Out[85]:
    B              A
0  L1     some value
0  L2     some value
1  L3  another value
1  L4  another value
1  L5  another value

UPDATE: a more generic solution

edited Nov 29 '17 at 13:17

answered Oct 10 '16 at 09:32

MaxU - stand with Ukraine

205,989
36
386
419

`lambda x: pd.Series(x[0])` should be changed to `lambda x: pd.Series(x)` in case of flat list values in column `B` – soupault Nov 29 '17 at 13:40
1

@soupault, that's correct, thank you! This code works for the particular question (that was asked). Partially because of that i have posted a link to a more generic solution... – MaxU - stand with Ukraine Nov 29 '17 at 13:59
@MaxU how can I do that for two columns if they have the same number of values in the list? – nurma_a Jul 29 '18 at 18:41
@nurma_a, check [this solution](https://stackoverflow.com/questions/12680754/split-explode-pandas-dataframe-string-entry-to-separate-rows/40449726#40449726) – MaxU - stand with Ukraine Jul 29 '18 at 18:54
Hi @MaxU--How can we do this in the opposite way? I mean wider format to long format. – Roy May 07 '21 at 13:17
@Roy, for the output dataframe in my answer: `df.groupby("A")["B"].apply(list)` ) – MaxU - stand with Ukraine May 07 '21 at 13:25
@MaxU. Great. Thanks. If we want only the column value (no index), how can we do that too? – Roy May 07 '21 at 13:43
1

Oh yeah. Thank you for informing, @MaxU :) – Roy May 07 '21 at 14:11
@Roy, i don't quite understand what do you mean saying "only the column value (no index)". It would be better to raise a new SO question with a small sample input dataset and the desired output – MaxU - stand with Ukraine May 07 '21 at 14:45

score 9 · Answer 3 · edited Jan 23 '19 at 15:28

Faster solution with chain.from_iterable and numpy.repeat:

from itertools import chain
import numpy as np
import pandas as pd

df = pd.DataFrame({'A':['a','b'],
                   'B':[[['A1', 'A2']],[['A1', 'A2', 'A3']]]})

print (df)
   A               B
0  a      [[A1, A2]]
1  b  [[A1, A2, A3]]


df1 = pd.DataFrame({ "A": np.repeat(df.A.values, 
                                    [len(x) for x in (chain.from_iterable(df.B))]),
                     "B": list(chain.from_iterable(chain.from_iterable(df.B)))})

print (df1)
   A   B
0  a  A1
1  a  A2
2  b  A1
3  b  A2
4  b  A3

Timings:

A = np.unique(np.random.randint(0, 1000, 1000))
B = [[list(string.ascii_letters[:random.randint(3, 10)])] for _ in range(len(A))]
df = pd.DataFrame({"A":A, "B":B})
print (df)
       A                                 B
0      0        [[a, b, c, d, e, f, g, h]]
1      1                       [[a, b, c]]
2      3     [[a, b, c, d, e, f, g, h, i]]
3      5                 [[a, b, c, d, e]]
4      6     [[a, b, c, d, e, f, g, h, i]]
5      7           [[a, b, c, d, e, f, g]]
6      8              [[a, b, c, d, e, f]]
7     10              [[a, b, c, d, e, f]]
8     11           [[a, b, c, d, e, f, g]]
9     12     [[a, b, c, d, e, f, g, h, i]]
10    13        [[a, b, c, d, e, f, g, h]]
...
...

In [67]: %timeit pd.DataFrame({ "A": np.repeat(df.A.values, [len(x) for x in (chain.from_iterable(df.B))]),"B": list(chain.from_iterable(chain.from_iterable(df.B)))})
1000 loops, best of 3: 818 µs per loop

In [68]: %timeit ((df['B'].apply(lambda x: pd.Series(x[0])).stack().reset_index(level=1, drop=True).to_frame('B').join(df[['A']], how='left')))
10 loops, best of 3: 103 ms per loop

This solution is `125` times faster as `apply` solution. – jezrael Oct 10 '16 at 13:22 — jezrael, Oct 10 '16 at 13:22

score 3 · Answer 4 · answered Oct 10 '16 at 09:44

3

I can't find a elegant way to handle this, but the following codes can work...

import pandas as pd
import numpy as np
df = pd.DataFrame([{"a":1,"b":[[1,2]]},{"a":4, "b":[[3,4,5]]}])
z = []
for k,row in df.iterrows():
    for j in list(np.array(row.b).flat):
        z.append({'a':row.a, 'b':j})
result = pd.DataFrame(z)

answered Oct 10 '16 at 09:44

Howardyan

667
1
6
15

this was the easiest for me to understand the working.. thank you. – ihightower Apr 01 '17 at 13:35

score 1 · Answer 5 · answered Mar 12 '19 at 09:45

1

I think this is the fastest and simplest way:

df = pd.DataFrame({'A':['a','b'],
               'B':[[['A1', 'A2']],[['A1', 'A2', 'A3']]]})


df.set_index('A')['B'].apply(lambda x: pd.Series(x[0]))

answered Mar 12 '19 at 09:45

Adrian P

19
3

While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value.[Read this](https://stackoverflow.com/help/how-to-answer) – Shanteshwar Inde Mar 12 '19 at 10:05

score 0 · Answer 6 · answered Apr 04 '19 at 13:41

Here's another option

unpacked = (pd.melt(df.B.apply(pd.Series).reset_index(),id_vars='index')
 .merge(df, left_on = 'index', right_index = True))
unpacked = (unpacked.loc[unpacked.value.notnull(),:]
.drop(columns=['index','variable','B'])
.rename(columns={'value':'B'})

Apply pd.series to column B --> splits each list entry to a different row
Melt this, so that each entry is a separate row (preserving index)
Merge this back on original dataframe
Tidy up - drop unnecessary columns and rename the values column

How to convert column with list of values into rows in Pandas DataFrame

6 Answers6

Pandas >= 0.25

Linked

Related