I have a data frame with a format like:
id | product
-------------
1  | A
1  | B
1  | C
2  | A
3  | A
3  | C 
What I want to accomplish is a 2 column data frame output where there is one row per ID with an array for every product owned by that ID. I tried some code with mapPartitions() but I get errors about not being able to infer schema. I know I have to yield something back in the map function, but I can't seem to figure it out.
Using Spark 1.6
Edit
In case anyone else has this question, I actually went with the solution here using combineByKey(): https://stackoverflow.com/a/27043562/1181412
It gave more flexibility to work the fields in a more granular way
 
     
    