I have a dataframe as follows
col1, col2, version_time, col3
root
 |-- col1: string (nullable = true)
 |-- col2: integer (nullable = true)
 |-- version_time: timestamp (nullable = true) 
 |-- col3: string (nullable = true)
Below are some sample rows
col1  col2  timestamp                 col3
 1     A    2021-05-09T13:53:20.219Z   B
 2     A    2021-01-09T13:53:20.219Z   C
 3     A    2021-02-09T13:53:20.219Z   D
 1     A    2020-05-09T13:53:20.219Z   E
 1     A    2019-05-09T13:53:20.219Z   F
What I want is to groupBy col1 and col2 with aggregate on max(timestamp) and also return all the columns.
col1  col2  timestamp                 col3
 1     A    2021-05-09T13:53:20.219Z   B
 2     A    2021-01-09T13:53:20.219Z   C
 3     A    2021-02-09T13:53:20.219Z   D
If i use groupBy on dataframe it will drop col3. I will have to join with original dataframe to get the value of col3
    col1  col2  timestamp                 
     1     A    2021-05-09T13:53:20.219Z
     2     A    2021-01-09T13:53:20.219Z
     3     A    2021-02-09T13:53:20.219Z
If I use Window.partitionBy, i will still have 5 rows with same timestamp value for col1 and col2, which is not what i want.
col1  col2  timestamp                 col3
 1     A    2021-05-09T13:53:20.219Z   B
 2     A    2021-01-09T13:53:20.219Z   C
 3     A    2021-02-09T13:53:20.219Z   D
 1     A    2021-05-09T13:53:20.219Z   E
 1     A    2021-05-09T13:53:20.219Z   F
Is there any other alternative?
 
    