I use Hortonworks 2.6 with 5 nodes. I spark-submit to YARN (with 16GB RAM and 4 cores).
I have a RDD transformation that runs fine in local but not with yarn master URL.
rdd1 has values like:
id name date
1 john 10/05/2001 (dd/mm/yyyy)
2 steve 11/06/2015
I'd like to change the date format from dd/mm/yyyy to mm/dd/yy, so I wrote a method transformations.transform that I use in RDD.map function as follows:
rdd2 = rdd1.map { rec => (rec.split(",")(0), transformations.transform(rec)) }
transformations.transform method is as follows:
object transformations {
def transform(t: String): String = {
val msg = s">>> transformations.transform($t)"
println(msg)
msg
}
}
Actually the above code works fine in local but not in cluster. The method just returns an output as if the map looked as follows:
rdd2 = rdd1.map { rec => (rec.split(",")(0), rec) }
rec does not seem to be passed to transformations.transform method.
I do use an action to trigger transformations.transform() method but no luck.
val rdd3 = rdd2.count()
println(rdd3)
println prints the count but does not call transformations.transform method. Why?
