I have one df like this:
>>> df1
        col_1   col_2    size_col  other_col
0        aaa     abc       4          zxc
1        aaa     abc       3          xcv
2        aaa     abc       1          cvb
3        bbb     bbc       7          vbn
4        bbb     bbc       3          bnm
5        ccc     cbc       1          asd
6        ddd     dbc       9          sdf
7        ccc     cbc       3          dfg
8        ccc     cbc       1          fgh
and want a df like this:
>>> df2
        col_1   col_2    size_col  other_col
0        aaa     abc       4          zxc
3        bbb     bbc       7          vbn
6        ddd     dbc       9          sdf
7        ccc     cbc       3          dfg
Explanation:
I want to all drop the where col_1 and col_2 have similar values, and retain the rows where 'size_col' is greatest for all the duplicate bunch. so, from above example, for the rows, where  col_1 and col_2 has aaa and abc, I need to retain the row where size_col has biggest value. or put other way, i need to group by col_1 and col_2 columns, then for each group, retain only the row where other_col have biggest value for the group.
How do I do this efficiently for a df with around 5 million rows and 7 columns?
 
     
    