I have a 'Dataset' in Java Spark related to cabs of a city, that among its several columns, it has :
- dayin the form- 2016-04-02, which is the day that the cab picked up a customer.
- vendor_id, which is for example- 1.
- hourin the form form of- 2or- 16.
I want to get the hour that each vendor, each day had the maximum number of customers. So, I think I should GroupBy on these three columns. What I get after GroupBy is
first 2 rows after I groupBy on day, vendor_id, hour  :
+----------+---------+----+-----+
|day       |vendor_id|hour|count|
+----------+---------+----+-----+
|2016-01-01|1        |2   |116  |
|2016-01-01|1        |1   |110  |
+----------+---------+----+-----+
How can I get the hour of each day of each vendor (the groups created by GroupBy) with the maximum count?
I have already seen that this is solved with join, but this and other examples grouped only on one column where here I grouped on three.
If possible, I prefer Java code that uses Spark libraries, thank you for your time.
 
    