I have a dataframe:
In [73]: data = {'ID':[1234,1234,1234,1234,1235,1235,1236,1237,1237,1237,1237], 'Date':['1/4/2001','1/4/2001','6/1/2003','6/1/2003', '7/1/1998', '7/1/1998', '4/23/2005', '7/1/2005','7/1/2005','7/1/2005','7/1/2005'], 'CalcYr': [2018, 2019, 2018, 2019, 2007, 2008, 2018, 2016, 2017, 2018, 2019], 'Values':[0.1,0.1,0.2,0.3,0.3,0.4,0.6,0,0.1,0,0.2]}
In[74]: df = pd.DataFrame(data)
In [75]: df
Out[75]:
ID Date CalcYr Values
0 1234 1/4/2001 2018 0.1
1 1234 1/4/2001 2019 0.1
2 1234 6/1/2003 2018 0.2
3 1234 6/1/2003 2019 0.3
4 1235 7/1/1998 2007 0.3
5 1235 7/1/1998 2008 0.4
6 1236 4/23/2005 2018 0.6
7 1237 7/1/2005 2016 0.0
8 1237 7/1/2005 2017 0.1
9 1237 7/1/2005 2018 0.0
10 1237 7/1/2005 2019 0.2
What I would like to do is keep only one row for the same values of ID and Date, where the CalcYr is maximum. For example, for ID of 1234, and Date of 1/4/2001, I'd only keep the row where CalcYr is 2019. The result would be:
ID Date CalcYr Values
0 1234 1/4/2001 2019 0.1
1 1234 6/1/2003 2019 0.3
2 1235 7/1/1998 2008 0.4
3 1236 4/23/2005 2018 0.6
4 1237 7/1/2005 2019 0.2