Let say I have a DataFrame as follow :
case class SubClass(id:String, size:Int,useless:String)
case class MotherClass(subClasss: Array[SubClass])
val df = sqlContext.createDataFrame(List(
      MotherClass(Array(
        SubClass("1",1,"thisIsUseless"),
        SubClass("2",2,"thisIsUseless"),
        SubClass("3",3,"thisIsUseless")
      )),
      MotherClass(Array(
        SubClass("4",4,"thisIsUseless"),
        SubClass("5",5,"thisIsUseless")
      ))
    ))
The schema is :
root
 |-- subClasss: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- id: string (nullable = true)
 |    |    |-- size: integer (nullable = false)
 |    |    |-- useless: string (nullable = true)
I'm looking for a way to select only a subset of fields : id and size of the array column subClasss, but with keeping the nested array structure.
The resulting schema would be : 
root
     |-- subClasss: array (nullable = true)
     |    |-- element: struct (containsNull = true)
     |    |    |-- id: string (nullable = true)
     |    |    |-- size: integer (nullable = false)
I've tried to do a
df.select("subClasss.id","subClasss.size")
But this splits the array subClasss in two arrays : 
root
 |-- id: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- size: array (nullable = true)
 |    |-- element: integer (containsNull = true)
Is there a way to keep the origin structure and just to eliminate the useless field ? Something that would look like : 
df.select("subClasss.[id,size]")
Thanks for your time.
 
    