Consider the following DataFrame
root
|-- values: array (nullable = true)
| |-- element: double (containsNull = true)
with content:
+-----------+
| values|
+-----------+
|[1.0, null]|
+-----------+
Now I want to pass thie value column to an UDF:
val inspect = udf((data:Seq[Double]) => {
data.foreach(println)
println()
data.foreach(d => println(d))
println()
data.foreach(d => println(d==null))
""
})
df.withColumn("dummy",inspect($"values"))
I'm really confused from the output of the above println statements:
1.0
null
1.0
0.0
false
false
My questions:
- Why is
foreach(println)not giving the same output asforeach(d=>println(d))? - How can the
Doublebe null in the first println-statement, I thought scala'sDoublecannot be null? - How can I filter null values in my
Seqother han filtering0.0which isnt really safe? Should I useSeq[java.lang.Double]as type for my input in the UDF and then filter nulls? (this works, but I'm unsure if that is the way to go)
Note that I'm aware of this Question, but my question is specific to array-type columns.