I have a DataFrame with only one row.
df = spark.createDataFrame([(1,2,10,3,4)],['a','b','c','d','e',])
But the number of columns is big, about 20,000. Now I want select the column with value larger than a threshold, eg 5. I try to convert DataFrame to dict to count, but meet max Heap size error.
Here, the expected output is:
+---+
| c|
+---+
| 10|
+---+