My Data looks like this:
   id | duration | action1 | action2 | ...
 ---------------------------------------------
    1 | 10       |   A     |   D
    1 | 10       |   B     |   E 
    2 | 25       |   A     |   E
    1 | 7        |   A     |   G
I want to group it by ID (which works great!):
df.rdd.groupBy(lambda x: x['id']).mapValues(list).collect()
And now I would like to group values within each group by duration to get something like this:
    [(id=1,
      ((duration=10,[(action1=A,action2=D),(action1=B,action2=E),
       (duration=7,(action1=A,action2=G)),
     (id=2,
       ((duration=25,(action1=A,action2=E)))]
And here is where I dont know how to do a nested group by. Any tips?
 
    