How can I get the first non-null values from a group by? I tried using first with coalesce F.first(F.coalesce("code")) but I don't get the desired behavior (I seem to get the first row).
from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql import functions as F
sc = SparkContext("local")
sqlContext = SQLContext(sc)
df = sqlContext.createDataFrame([
    ("a", None, None),
    ("a", "code1", None),
    ("a", "code2", "name2"),
], ["id", "code", "name"])
I tried:
(df
  .groupby("id")
  .agg(F.first(F.coalesce("code")),
       F.first(F.coalesce("name")))
  .collect())
DESIRED OUTPUT
[Row(id='a', code='code1', name='name2')]
 
     
    