Pyspkark / Spark SQL Flatten Parent Child Table With Unknown Level of Hierarchy

Asked Mar 09 '21 at 18:53

Active Mar 09 '21 at 18:53

Viewed 1,255 times

I have a Dataframe with two columns (Parent and Child), which have unknown levels of granularity / hierarchy, as shown below:

And the result should be something like, using pyspark dataframe functions or sparkSQL:

level1	level2	level3	level4	granularity_level
A	B	C	D	4
A	B	BB	Null	3
X	Y	Z	Null	3

Thanks & Regards

asked Mar 09 '21 at 18:53

El Mehdi OUAFIQ

another similar question: https://stackoverflow.com/questions/62450917/build-a-hierarchy-from-a-relational-data-set-using-pyspark – mck Mar 09 '21 at 18:55
Thank you for your comment. They might seem similar but unfortunately they are note. Because here in this case we don't know neither the highest nor the lowest level of granularity. Regards, – El Mehdi OUAFIQ Mar 09 '21 at 18:57
I think mck is right, what do you mean? – thebluephantom Mar 09 '21 at 19:36
How about this https://sqlandhadoop.com/how-to-implement-recursive-queries-in-spark/ and then apply a pivot or similar? – thebluephantom Mar 09 '21 at 19:39
Have you found a solution for this? – Laur Jan 09 '23 at 14:17

0 Answers0