I'm dealing with deeply nested json data. My goal is to flatten the data. I know I can do this by using the following notation in the case when the nested column I want is called attributes.id, where id is nested in the attributes column:
df = df.select('attributes.id')
The problem is that there is already a column in df called id and since spark only keeps the last part after . as the column name, I now have duplicated column names. What is the best way of dealing with this? Ideally the new column will be called attributes_id as to differentiate it from the id column.