I have a Dataframe with two columns (Parent and Child), which have unknown levels of granularity / hierarchy, as shown below:
| Parent | Child |
|---|---|
| A | B |
| B | C |
| C | D |
| B | BB |
| X | Y |
| Y | Z |
And the result should be something like, using pyspark dataframe functions or sparkSQL:
| level1 | level2 | level3 | level4 | granularity_level |
|---|---|---|---|---|
| A | B | C | D | 4 |
| A | B | BB | Null | 3 |
| X | Y | Z | Null | 3 |
Thanks & Regards