I have the following dataframe but I cannot work out how to extract all the columns the first row of a group.
+--------------------+------------+--------+
|           timestamp|nanos       |file_idx|
+--------------------+------------+--------+
|2018-09-07 05:00:...|    64044267|      1 |
|2018-09-07 05:00:...|    64044267|      2 |
|2018-09-07 05:00:...|    58789223|      3 |
+--------------------+------------+--------+
How can do I extract the row with the biggest file_idx for the same timestamp and nanosecond? I've tried using a groupBy function but it only returns those columns in my group by clause, where in reality this table contains 160 columns.
The desired outcome in the above example would be
+--------------------+------------+--------+
|           timestamp|nanos       |file_idx|
+--------------------+------------+--------+
|2018-09-07 05:00:...|    64044267|      2 |
|2018-09-07 05:00:...|    58789223|      3 |
+--------------------+------------+--------+
 
     
    