I am trying to flatten a complex JSON structure containing nested arrays, struct elements using a generic function which should work for any JSON files with any schema.
Below is a part of sample JSON structure which I want to flatten
root
 |-- Data: struct (nullable = true)
 |    |-- Record: struct (nullable = true)
 |    |    |-- FName: string (nullable = true)
 |    |    |-- LName: long (nullable = true)
 |    |    |-- Address: struct (nullable = true)
 |    |    |    |-- Applicant: array (nullable = true)
 |    |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |    |-- Id: long (nullable = true)
 |    |    |    |    |    |-- Type: string (nullable = true)
 |    |    |    |    |    |-- Option: long (nullable = true)
 |    |    |    |-- Location: string (nullable = true)
 |    |    |    |-- Town: long (nullable = true)
 |    |    |-- IsActive: boolean (nullable = true)
 |-- Id: string (nullable = true)
to
root
 |-- Data_Record_FName: string (nullable = true)
 |-- Data_Record_LName: long (nullable = true)
 |-- Data_Record_Address_Applicant_Id: long (nullable = true)
 |-- Data_Record_Address_Applicant_Type: string (nullable = true)
 |-- Data_Record_Address_Applicant_Option: long (nullable = true)
 |-- Data_Record_Address_Location: string (nullable = true)
 |-- Data_Record_Address_Town: long (nullable = true)
 |-- Data_Record_IsActive: boolean (nullable = true)
 |-- Id: string (nullable = true)
I am using the code below as suggested in below thread
How to flatten a struct in a Spark dataframe?
def flatten_df(nested_df, layers):
    flat_cols = []
    nested_cols = []
    flat_df = []
    flat_cols.append([c[0] for c in nested_df.dtypes if c[1][:6] != 'struct'])
    nested_cols.append([c[0] for c in nested_df.dtypes if c[1][:6] == 'struct'])
    flat_df.append(nested_df.select(flat_cols[0] +
                               [col(nc+'.'+c).alias(nc+'_'+c)
                                for nc in nested_cols[0]
                                for c in nested_df.select(nc+'.*').columns])
                  )
    for i in range(1, layers):
        print (flat_cols[i-1])
        flat_cols.append([c[0] for c in flat_df[i-1].dtypes if c[1][:6] != 'struct'])
        nested_cols.append([c[0] for c in flat_df[i-1].dtypes if c[1][:6] == 'struct'])
        flat_df.append(flat_df[i-1].select(flat_cols[i] +
                                [col(nc+'.'+c).alias(nc+'_'+c)
                                    for nc in nested_cols[i]
                                    for c in flat_df[i-1].select(nc+'.*').columns])
        )
    return flat_df[-1]
my_flattened_df = flatten_df(jsonDF, 10)
my_flattened_df.printSchema()   
But it doesn't work for array elements. With above code I am getting output as below. Can you please help. How can I modify this piece of code to include arrays too.
root
 |-- Data_Record_FName: string (nullable = true)
 |-- Data_Record_LName: long (nullable = true)
 |-- Data_Record_Address_Applicant: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- Id: long (nullable = true)
 |    |    |-- Type: string (nullable = true)
 |    |    |-- Option: long (nullable = true)
 |-- Data_Record_Address_Location: string (nullable = true)
 |-- Data_Record_Address_Town: long (nullable = true)
 |-- Data_Record_IsActive: boolean (nullable = true)
 |-- Id: string (nullable = true)
This is not a duplicate as there is no post regarding a generic function to flatten complex JSON schema that includes arrays too.