I have the following pyspark df:
+------------------+--------+-------+
|                ID|  Assets|Revenue|
+------------------+--------+-------+
|201542399349300619| 1633944|  32850|
|201542399349300629| 3979760| 850914|
|201542399349300634| 3402687|1983568|
|201542399349300724| 1138291|1097553|
|201522369349300122| 1401406|1010828|
|201522369349300137|   16948| 171534|
|201522369349300142|13474056|2285323|
|201522369349300202|  481045| 241788|
|201522369349300207|  700861|1185640|
|201522369349300227|  178479| 267976|
+------------------+--------+-------+
For each row, I want to be able to get the rows that are within 20% of the Assets amount. For example, for the first row (ID=201542399349300619), I want to be able to get all the rows where Assets are within 20% +/- of 1,633,944 (so between 1,307,155 to 1,960,732):
+------------------+--------+-------+
|                ID|  Assets|Revenue|
+------------------+--------+-------+
|201542399349300619| 1633944|  32850|
|201522369349300122| 1401406|1010828|
Using this subsetted table, I want to get the average assets and add it as a new column. So for the above example, it would be the average assets of (1633944+1401406) = 1517675
+------------------+--------+-------+---------+
|                ID|  Assets|Revenue|AvgAssets|
+------------------+--------+-------+---------+
|201542399349300619| 1633944|  32850|  1517675|
 
    