2

I am trying to use scala UDF in PySpark.

I have created a jar file, which have a scala UDF, the code for which looks like:

package example
import org.apache.spark.sql.expressions.UserDefinedFunction
import org.apache.spark.sql.functions._
object UDF_Scala_code {
  def main(args: Array[String]): Unit = {
    getFun()
  }
  def getStringLength(s: String) = s.length
  def getFun(): UserDefinedFunction = udf(getStringLength _)
}

Using Spark 2.4

I am getting an error for below line in pyspark code (use_scala_udf1.py):

spark.udf.registerJavaFunction("get_col_len", "example.UDF_Scala_code.getFun", StringType())

Command I am running is:

spark2-submit --jars /path/ws_spark_scala.jar use_scala_udf1.py

ERROR:

pyspark.sql.utils.AnalysisException: u'Can not load class example.UDF_Scala_code.getFun, please make sure it is on the classpath;'

However if I use below code in Pyspark, I get the expected result.

_string_length = sc._jvm.example.UDF_Scala_code.getFun()
return Column(_string_length.apply(_to_seq(sc, [col], _to_java_column)))

Thanks.

0 Answers0