I have this test data:
 val data = List(
        List(47.5335D),
        List(67.5335D),
        List(69.5335D),
        List(444.1235D),
        List(677.5335D)
      )
I'm expecting median to be 69.5335. But when I try to find exact median with this code:
df.stat.approxQuantile(column, Array(0.5), 0)
It gives me: 444.1235
Why is this so and how it can be fixed?
I'm doing it like this:
      val data = List(
        List(47.5335D),
        List(67.5335D),
        List(69.5335D),
        List(444.1235D),
        List(677.5335D)
      )
      val rdd = sparkContext.parallelize(data).map(Row.fromSeq(_))
      val schema = StructType(Array(
        StructField("value", DataTypes.DoubleType, false)
      ))
      val df = sqlContext.createDataFrame(rdd, schema)
      df.createOrReplaceTempView(tableName)
val df2 = sc.sql(s"SELECT value FROM $tableName")
val median = df2.stat.approxQuantile("value", Array(0.5), 0)
So I'm creating temp table. Then search inside it and then calculate result. It's just for testing.
 
     
     
    