I have a python  list (p_list) with 0 and 1 with as many elements as a spark dataframe that has one column only (all elements, are like: 'imaj7felb438l6hk', ....).
And I am trying to add this list as column into the spark dataframe (df_cookie). But there is no key. So far I tried:
1) Transform df_cookie into rdd, doesn't work, as it is really big and I run out of memory
2) Transform df_cookie into a pandas df, doesn't work (same reasons as 1))
3) Transform the list into a new dataframe, and use monotonically_increasing_id(), to get a common key and link both. This doesn't work either, as i end up with different ids in each dataframe.
Any suggestions?
test_list = [i for i in range(cookie.count())]
res = spark.createDataFrame(test_list, IntegerType()).toDF('ind')
df_res = res.withColumn('row', monotonically_increasing_id())
df_res.show(5)
+---+---+
|ind|row|
+---+---+
|  0|  0|
|  1|  1|
|  2|  2|
|  3|  3|
|  4|  4|
+---+---+
df_cookie = cookie.withColumn('row', monotonically_increasing_id())
df_cookie.show(5)
+--------------------+-----------+
|              cookie|        row|
+--------------------+-----------+
|    imaj7felb438l6hk|68719476736|
|hk3l641k5r1m2umv2...|68719476737|
|    ims1arqgxczr6rfm|68719476738|
|2t4rlplypc1ks1hnf...|68719476739|
|17gpx1x3j5eq03dpw...|68719476740|
+--------------------+-----------+
Desired output:
+--------------------+-----------+
|              cookie|        ind|
+--------------------+-----------+
|    imaj7felb438l6hk|          0|
|hk3l641k5r1m2umv2...|          1|
|    ims1arqgxczr6rfm|          2|
|2t4rlplypc1ks1hnf...|          3|
|17gpx1x3j5eq03dpw...|          4|
+--------------------+-----------+