I had a dataframe like below, and I would like to remove duplications based on certain criteria. 1) If the startdate is greater than Month, it will be removed. 2) If the startdate is less than Month, keep the latest record.
>       COMP    Month       Startdate   bundle            result
> 0     TD3M    2018-03-01  2015-08-28  01_Essential      keep    
> 1     TD3M    2018-03-01  2018-07-17  04_Complete       remove
> 2     TD3M    2018-04-01  2015-08-28  01_Essential      keep
> 3     TD3M    2018-04-01  2018-07-17  04_Complete       remove
> 4     TD3M    2018-05-01  2015-08-28  01_Essential      keep
> 5     TD3M    2018-05-01  2018-07-17  04_Complete       remove
> 6     TD3M    2018-06-01  2015-08-28  01_Essential      keep
> 7     TD3M    2018-06-01  2018-07-17  04_Complete       remove
> 8     TD3M    2018-08-01  2015-08-28  01_Essential      remove
> 9     TD3M    2018-08-01  2018-07-17  04_Complete       keep
> 10    TD3M    2018-09-01  2015-08-28  01_Essential      remove
> 11    TD3M    2018-09-01  2018-07-17  04_Complete       keep
The expected output would be:
>       COMP    Month       Startdate   bundle            
> 0     TD3M    2018-03-01  2015-08-28  01_Essential      
> 2     TD3M    2018-04-01  2015-08-28  01_Essential     
> 4     TD3M    2018-05-01  2015-08-28  01_Essential     
> 6     TD3M    2018-06-01  2015-08-28  01_Essential     
> 9     TD3M    2018-08-01  2018-07-17  04_Complete  
> 11    TD3M    2018-09-01  2018-07-17  04_Complete          
 
     
    