I have two pyspark dataframes:
1st dataframe: plants
 +-----+--------+
 |plant|station |
 +-----+--------+
 |Kech |    st1 |
 |Casa |    st2 |
 +-----+--------+
2nd dataframe: stations
 +-------+--------+
 |program|station |
 +-------+--------+
 |pr1    |    null|
 |pr2    |    st1 |
 +-------+--------+
What i want is to replace the null values in the second dataframe stations with all the column station in the first dataframe. Like this :
+-------+--------------+
|program|station       |
+-------+--------------+
|pr1    |    [st1, st2]|
|pr2    |    st1       |
+-------+--------------+
I did this:
stList = plants.select(F.col('station')).rdd.map(lambda x: x[0]).collect()
stations = stations.select(
                    F.col('program')
                    F.when(stations.station.isNull(), stList).otherwise(stations.station).alias('station')
)
but it gives me an error when doesn't accept python list as a parameter
 
     
    