I am trying to use scala UDF in PySpark.
I have created a jar file, which have a scala UDF, the code for which looks like:
package example
import org.apache.spark.sql.expressions.UserDefinedFunction
import org.apache.spark.sql.functions._
object UDF_Scala_code {
def main(args: Array[String]): Unit = {
getFun()
}
def getStringLength(s: String) = s.length
def getFun(): UserDefinedFunction = udf(getStringLength _)
}
Using Spark 2.4
I am getting an error for below line in pyspark code (use_scala_udf1.py):
spark.udf.registerJavaFunction("get_col_len", "example.UDF_Scala_code.getFun", StringType())
Command I am running is:
spark2-submit --jars /path/ws_spark_scala.jar use_scala_udf1.py
ERROR:
pyspark.sql.utils.AnalysisException: u'Can not load class example.UDF_Scala_code.getFun, please make sure it is on the classpath;'
However if I use below code in Pyspark, I get the expected result.
_string_length = sc._jvm.example.UDF_Scala_code.getFun()
return Column(_string_length.apply(_to_seq(sc, [col], _to_java_column)))
Thanks.