My dataframe looks like this: The specific values for a respective entity are at the same index of the list in a consistent way overarching all shown columns.
column_1                                       | [2022-08-05 03:38...
column_2                                       | [inside, inside, ...
column_3                                       | [269344c6-c01c-45...
column_4                                       | [ff870660-57ce-11...
column_5                                       | [Mannheim, Mannhe...
column_6                                       | [26, 21, 2, 8]      
column_7                                       | [fa8103a0-57ce-11...
column_8                                       | [ATG1, ATG3, Variable1...
My Approach:
#Get columns
df_colum_names = list(df.schema.names)
# Set condition with a expression
filter_func = ("filter(geofenceeventtype,spatial_wi_df -> df.column_8 == 'Variable1')")
geofence_expr= f"transform(sort_array({filter_func}), x -> x."
geofence_prefix = "geofence_sorted"
# extract to new columns
for col in df_colum_names:
        df = df.withColumn(
        geofence_prefix + col,
        F.element_at(
        F.expr(geofence_expr + col.replace("_", ".") + ")"), 1),)
In this way i want to create columns only with the specific values of entity 'Variable1' and then drop all rows without data from this entity.
The error message:
Can't extract value from lambda df#2345: need struct type but got string
So there are rows where the value of the column is just one value as a String and not a Structtype, how to deal with this problem?
