Spark DataFrame schema:
In [177]: testtbl.printSchema()
root
|-- Date: long (nullable = true)
|-- Close: double (nullable = true)
|-- Volume: double (nullable = true)
I wish to apply a scalar-valued function a column of testtbl. Suppose I wish to calculate an average of the 'Close' column. For an rdd I would do something like
rdd.fold(0, lambda x,y: x+y)
But testtbl.Close is not an rdd,, it is a column object with limited functionality. Rows of testtbl are rdds, columns are not. So how to apply add, or a user function to a single column?