Using Pyspark 2.2
I have a spark DataFrame with multiple columns. I need to input 2 columns to a UDF and return a 3rd column
Input:
+-----+------+
|col_A| col_B|
+-----+------+
|  abc|abcdef|
|  abc|     a|
+-----+------+
Both col_A and col_B are StringType()
Desired output:
+-----+------+-------+
|col_A| col_B|new_col|
+-----+------+-------+
|  abc|abcdef|    abc|
|  abc|     a|      a|
+-----+------+-------+
I want new_col to be a substring of col_A with the length of col_B.
I tried
udf_substring = F.udf(lambda x: F.substring(x[0],0,F.length(x[1])), StringType())
df.withColumn('new_col', udf_substring([F.col('col_A'),F.col('col_B')])).show()
But it gives the TypeError: Column is not iterable.
Any idea how to do such manipulation?
 
    