You need to explode the arrays in a nested manner and select the struct elements in separate columns For that you would need explode inbuilt function and select api and aliasing.
Code to try :
import org.apache.spark.sql.functions._
val finalDF = df.withColumn("trialTherapeuticAreas", explode(col("trialTherapeuticAreas")))
.select(col("trialTherapeuticAreas.id").as("trialTherapeuticAreas_ID"), col("trialTherapeuticAreas.name").as("trialTherapeuticAreas_name"), explode(col("trialTherapeuticAreas.trialDiseases")).as("trialDiseases"))
.select(col("trialTherapeuticAreas_ID"), col("trialTherapeuticAreas_name"), col("trialDiseases.id").as("trialDiseases_id"), col("trialDiseases.name").as("trialDiseases_name"), explode(col("trialDiseases.trialPatientSegments")).as("trialPatientSegments"))
.select(col("trialTherapeuticAreas_ID"), col("trialTherapeuticAreas_name"), col("trialDiseases_id"), col("trialDiseases_name"), col("trialPatientSegments.id").as("trialPatientSegments_id"), col("trialPatientSegments.name").as("trialPatientSegments_name"))
You should get your requirements fulfilled
You can do the above transformation using three withColumn api and one select statement too.
import org.apache.spark.sql.functions._
val finalDF = df.withColumn("trialTherapeuticAreas", explode(col("trialTherapeuticAreas")))
.withColumn("trialDiseases", explode(col("trialTherapeuticAreas.trialDiseases")))
.withColumn("trialPatientSegments", explode(col("trialDiseases.trialPatientSegments")))
.select(col("trialTherapeuticAreas.id").as("trialTherapeuticAreas_ID"), col("trialTherapeuticAreas.name").as("trialTherapeuticAreas_name"), col("trialDiseases.id").as("trialDiseases_id"), col("trialDiseases.name").as("trialDiseases_name"), col("trialPatientSegments.id").as("trialPatientSegments_id"), col("trialPatientSegments.name").as("trialPatientSegments_name"))
Consecutive use of withColumn is not recommended for huge dataset as it might give random output. The reason is that withColumn is distributed and order of execution is not proved to be followed in serial manner