I have a DataFrame on pyspark (called GPS) and I want to get its columns' data as lists, with each row of each column being an element of a list, so I used the following list comprehension:
 ls = [x.GPS_COORDINATES for x in GPS.collect()]
This works as expected, by when I try to pass this as a UDF and apply on the whole DataFrame, as follows:
 from pyspark.sql.types import ArrayType, StringType
 import pandas as pd
 def col_pipe_duplicater(col):
     ls = [x.GPS_COORDINATES for x in col.collect()]
     return ls
 pipe_remover_udf = udf(col_pipe_duplicater)#, ArrayType(StringType()))
 (
     GPS.select('GPS_COORDINATES',
         GPS.withColumn('New_col', pipe_remover_udf('GPS_COORDINATES')))
 )
I get the following error:
Invalid argument, not a string or column: DataFrame[GPS_COORDINATES: string, New_col: string] of type <class 'pyspark.sql.dataframe.DataFrame'>. For column literals, use 'lit', 'array', 'struct' or 'create_map' function.
Any idea on how to debug this? (If needed, I am using Jupyter Docker Stack (pyspark-notebook) with Spark 2.3.1 on a MacBook Pro) 
Thanks a lot,
