I created a PySpark dataframe using the following code
testlist = [
             {"category":"A","name":"A1"}, 
             {"category":"A","name":"A2"}, 
             {"category":"B","name":"B1"},
             {"category":"B","name":"B2"}
]
spark_df = spark.createDataFrame(testlist)
Result:
category    name
A           A1
A           A2
B           B1
B           B2
I want to make it appear as follows:
category    name
A           A1, A2
B           B1, B2
I tried the following code which does not work
spark_df.groupby('category').agg('name', lambda x:x + ', ')
Can anyone help identify what I am doing wrong and the best way to make this happen ?
 
     
     
    