I'd like to change the structure of Dataframe on Pyspark.
root
 |-- roster_id: long (nullable = true)
 |-- members: struct (nullable = true)
 |    |-- m10: struct (nullable = true)
 |    |    |-- name: string (nullable = true)
 |    |    |-- address: string (nullable = true)
 |    |    |-- hobby_1: string (nullable = true
 |    |    |-- hobby_2: string (nullable = true
 |    |-- m15: struct (nullable = true)
 |    |    |-- name: string (nullable = true)
 ~~~~~~~
I want to
root
 |-- roster_id: long (nullable = true)
 |-- member_id: string (nullable = true)
 |-- name: string (nullable = true)
 |-- address: string (nullable = true)
 |-- hobby_1: string (nullable = true)
 |-- hobby_2: string (nullable = true)
But there is a problem.
・I do not know what is in "members.X".
・"members.X.X"(example hobby_2) may not be depending on member.
I think this is difficult. Is there a way?
Please tell me if using Pyspark is not suitable.
Example
RowData
{
  "roster_id": "abc",
  "members": {
    "m10": {
      "name": "John",
      "address": "Tokyo",
      "hobby_1": "Baseball",
      "hobby_2": "Teniss"
    },
    "m15": {
      "name": "Paul",
      "address": "NY",
      "hobby_1": "Music"
    }
  }
}
I want to
+---------+---------+-------+-------+--------+-------+
|roster_id|member_id|   name| adress|hobby_1 |hobby_2|
+---------+---------+-------+-------+--------+-------+
|      abc|      m10|   John|  Tokyo|Baseball|  Music|
+---------+---------+-------+-------+--------+-------+
|      abc|      m15|   Paul|     NY|   Music|   null|
+---------+---------+-------+-------+--------+-------+
