ValueError: "cannot reindex from a duplicate axis" in groupby Pandas

Question

My dataframe looks like this:

    SKU #    GRP    CATG   PRD
0   54995  9404000  4040  99999
1   54999  9404000  4040  99999
2   55037  9404000  4040  1556894
3   55148  9404000  4040  1556894
4   55254  9404000  4040  1556894
5   55291  9404000  4040  1556894
6   55294  9404000  4040  1556895
7   55445  9404000  4040  1556895
8   55807  9404001  4040  1556896
9   49021  9404002  4040  1556897
10  49035  9404002  4040  1556897
11  27538  9404000  4040  1556898
12  27539  9404000  4040  1556899
13  27540  9404000  4040  1556894
14  27542  9404000  4040  1556900
15  27543  9404000  4040  1556900
16  27544  9404003  4040  1556901
17  27546  9404004  4040  1556902
18  99111  9404005  4040  1556903
19  99112  9404006  4040  1556904
20  99113  9404007  4040  1556905
21  99116  9404008  4040  1556906
22  99119  9404009  4040  1556907
23  99122  94040010 4040  1556908
24  99125  94040011 4040  1556909
25  86007  94040012 4040  1556910
26  86010  94040013 4040  1556911

And when I try to perform a group by operation on the above dataframe, I get the "cannot reindex from a duplicate axis" error.

df.groupby(['GRP','CATG'],as_index=False)['PRD'].min()

I tried to find out the duplicate indices using:

df[df.index.duplicated()]

But didn't return any thing. How can I go about resolving this issue?

I was not able to duplicate your problem with the given data. — Scott Boston, Feb 17 '20 at 20:48
This is often due to duplications in your columns. First, try `df.columns.duplicated().any()` and if there are any duplicated columns then drop them with `df.loc[:,~df.columns.duplicated()]` — Gene Burinsky, Feb 17 '20 at 20:54
I thought the problem was with one of these columns. I refreshed and reran the script on the subset. But after you pointed out that you were not able to replicate it, I isolated this subset and retried and it didn't throw any error. I now have to figure out which of the 79 columns, is responsible for the error. Thanks for the prompt response Scott. — vgaurav, Feb 17 '20 at 21:08
@Gene: Thanks for the input. That was exactly the problem in my df. — vgaurav, Feb 18 '20 at 04:04
@vgaurav excellent, I'll make it an answer so that we can mark this question closed and future folks can use it as a reference — Gene Burinsky, Feb 18 '20 at 05:19
Does this answer your question? [What does \`ValueError: cannot reindex from a duplicate axis\` mean?](https://stackoverflow.com/questions/27236275/what-does-valueerror-cannot-reindex-from-a-duplicate-axis-mean) — Gene Burinsky, Feb 18 '20 at 22:25

score 9 · Accepted Answer · answered Feb 18 '20 at 05:21

This error is often thrown due to duplications in your column names (not necessarily values)

First, just check if there is any duplication in your column names using the code: df.columns.duplicated().any()

If it's true, then remove the duplicated columns

df.loc[:,~df.columns.duplicated()]

After you remove the duplicated columns, you should be able to run your groupby operation.

score 2 · Answer 2 · answered Dec 31 '21 at 09:39

2

Look at this. It helped me. Seems we need to reset index for series too.

answered Dec 31 '21 at 09:39

lochanitis

21
2

While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - [From Review](/review/late-answers/30710519) – Sercan Jan 01 '22 at 15:52
@Sercan a) it does answer the question. It worked for me. Knew I had no duplicate column names. b) True about links in general but then this is a SO URL so it should work for a while. After some time pandas might get up-dated possibly making the whole question irrelevant.... – Simone Aug 10 '22 at 14:36

score 1 · Answer 3 · answered Oct 07 '21 at 20:01

1

Check for the duplicate in your indices as well. That was the problem with my data frame. I found this link very helpful: Solve Pandas “ValueError: cannot reindex from a duplicate axis

answered Oct 07 '21 at 20:01

mk_sch

1,060
4
16
31

ValueError: "cannot reindex from a duplicate axis" in groupby Pandas

3 Answers3

Linked