I don't believe you can pass in the DateTimeFormatter as an argument to the UDF. You can only pass in a Column. One solution would be to do:
val return_date = udf((str: String, format: String) => {
DateTimeFormat.forPatten(format).formatted(str))
})
And then:
val user_with_dates_formatted = users.withColumn(
"formatted_date",
return_date(users("ordering_date"), lit("yyyy/MM/dd"))
)
Honestly, though -- both this and your original algorithms have the same problem. They both parse yyyy/MM/dd using forPattern for every record. Better would be to create a singleton object wrapped around a Map[String,DateTimeFormatter], maybe like this (thoroughly untested, but you get the idea):
object DateFormatters {
var formatters = Map[String,DateTimeFormatter]()
def getFormatter(format: String) : DateTimeFormatter = {
if (formatters.get(format).isEmpty) {
formatters = formatters + (format -> DateTimeFormat.forPattern(format))
}
formatters.get(format).get
}
}
Then you would change your UDF to:
val return_date = udf((str: String, format: String) => {
DateFormatters.getFormatter(format).formatted(str))
})
That way, DateTimeFormat.forPattern(...) is only called once per format per executor.
One thing to note about the singleton object solution is that you can't define the object in the spark-shell -- you have to pack it up in a JAR file and use the --jars option to spark-shell if you want to use the DateFormatters object in the shell.