Group a dataframe on one column and take max from one column and its corresponding value from the other col

Question

I have a large dataframe which has a similar pattern as below:

    X   Y   Z
0   a   p   2
1   a   q   5
2   a   r   6
3   a   s   3
4   b   w   10
5   b   z   20
6   b   y   9
7   b   x   20

And can be constructed as:

df = {
    'X': ['a', 'a', 'a', 'a', 'b', 'b', 'b', 'b'],
    'Y': ['p', 'q', 'r', 's', 'w', 'x', 'y', 'z'],
    'Z': [2, 5, 6, 3, 10, 20, 9, 5]
}

Now I want to group this dataframe by the first column i.e., X and take max from the Z column and its corresponding value from Y. And if there are two max values in Z, then I would like to take alphabetically first value from Y.

So my expected result would look like:

X   Y   Z
a   r   6
b   x   20

I have tried groupby('X', as_index=False).agg({'Z': 'max', 'Y': 'first'}) but this selects max from Z and first from Y both at the same time.

Additionally I know there is a pd.series.groupby.nlargest(1) approach, but this would take a lot of time for my dataset.

Any suggestions on how could I proceed would be appreciated.

Thanks in advance:)

@jezrael Its not a dupe..And your answer in comment will not work when there are two equal max values in column `Z` because we also have to keep in mind that alphabetically smallest value in column `Y` is to be selected. — Shubham Sharma, Mar 15 '21 at 13:02
@jezrael I don't see my exact answer in the dupe...And IMHO the marked dupe is not in any way related to this question — Shubham Sharma, Mar 15 '21 at 13:07
I too agree that this question is not particularly related to the one mentioned above — theProcrastinator, Mar 15 '21 at 13:21
@jezrael Check `And if there are two max values in Z, then I would like to take alphabetically first value from Y.` from the question — Shubham Sharma, Mar 15 '21 at 13:32
@jezrael would you mind showing your answer which is not publicly visible now, I think it could be useful to solve a problem. Thanks:) — theProcrastinator, Mar 19 '21 at 07:48
I know but there is something else i wanna see, you can hide it soon after — theProcrastinator, Mar 19 '21 at 07:49

Shubham Sharma · Accepted Answer · 2021-03-15T13:24:54.730

5

Let us try sort_values + drop_duplicates:

df.sort_values(['X', 'Z', 'Y'], ascending=[True, False, True]).drop_duplicates('X')

   X  Y   Z
2  a  r   6
5  b  x  20

edited Mar 15 '21 at 13:24

answered Mar 15 '21 at 12:52

Shubham Sharma

68,127
6
24
53

Group a dataframe on one column and take max from one column and its corresponding value from the other col

1 Answers1