I have a large dataset grouped by column, row, year, potveg, and total. I am trying to get the max value of 'total' column in a specific year of a group. i.e., for the dataset below:
col      row    year    potveg  total
-125.0  42.5    2015    9       697.3
                2015    13      535.2
                2015    15      82.3
                2016    9       907.8
                2016    13      137.6
                2016    15      268.4
                2017    9       961.9
                2017    13      74.2
                2017    15      248.0
                2018    9       937.9
                2018    13      575.6
                2018    15      215.5
-135.0  70.5    2015    8       697.3
                2015    10      535.2
                2015    19      82.3
                2016    8       907.8
                2016    10      137.6
                2016    19      268.4
                2017    8       961.9
                2017    10      74.2
                2017    19      248.0
                2018    8       937.9
                2018    10      575.6
                2018    19      215.5
I would like the output to look like this:
col      row    year    potveg  total
-125.0  42.5    2015    9       697.3
                2016    9       907.8
                2017    9       961.9
                2018    9       937.9
-135.0  70.5    2015    8       697.3
                2016    8       907.8
                2017    8       961.9
                2018    8       937.9
I tried this:
df.groupby(['col', 'row', 'year', 'potveg']).agg({'total': 'max'})
and this:
df.groupby(['col', 'row', 'year', 'potveg'])['total'].max()
but they do not seem to work because the output has too many rows. I think the issue is the 'potveg' column which is a subgroup. I am not sure how to select rows containing max value of 'total'.
 
     
    