I have a dataframe like this:
----------------------------------------------
| User_ID |      Timestamp      | Article_ID |
----------------------------------------------
| 121212  | 2018-01-15 10:00:00 |      1     |
| 121212  | 2018-01-15 10:05:00 |      11    |
| 121212  | 2018-01-15 10:10:00 |      12    |
| 989898  | 2018-01-15 17:30:00 |      100   |
| 989898  | 2018-01-15 17:40:00 |      200   |
| 989898  | 2018-01-15 17:50:00 |      1     |
| 989898  | 2018-01-15 17:55:00 |      11    |
|...      |                     |            |
----------------------------------------------
Now i want the row with the minimum Timestamp per User_ID. The result should be:
----------------------------------------------
| User_ID |      Timestamp      | Article_ID |
----------------------------------------------
| 121212  | 2018-01-15 10:00:00 |      1     |
| 989898  | 2018-01-15 17:30:00 |      100   |
|...      |                     |            |
----------------------------------------------
I tried the following:
df.groupBy('User_ID').agg(F.min('Timestamp')).show()
That's not so bad, but the column 'Article_ID' is missing... Can someone please help me?
 
    