Why your code is not supposed to work:
- Before your
foreach task is started, whole your function's closure inside foreach block is serialized and sent first to master, then to each of workers. This means each of them will have its own instance of mutable.LinkedHashMap as copy of link.
- During
foreach block each worker will put each of its items inside its own link copy
- After your task is done you have still empty local
link and several non-empty former copies on each of worker nodes.
Moral is clear: don't use local mutable collections with RDD. It's just not going to work.
One way to get whole collection to local machine is collect method.
You can use it as:
val link = fieldTypeMapRDD.collect.toMap
or in case of need to preserve the order:
import scala.collection.immutable.ListMap
val link = ListMap(fieldTypeMapRDD.collect:_*)
But if you are really into mutable collections, you can modify your code a bit. Just change
fieldTypeMapRDD.foreach {
to
fieldTypeMapRDD.toLocalIterator.foreach {
See also this question.