I know the difference between map and mapPartitions which target elements and iterators of elements respectively.
When should I use which? If the overhead is similar, why would I ever use mapPartitions, since map is easier to write?
I know the difference between map and mapPartitions which target elements and iterators of elements respectively.
When should I use which? If the overhead is similar, why would I ever use mapPartitions, since map is easier to write?
RDD.map maps a function to each element of an RDD, whereas RDD.mapPartitions maps a function to each partition of an RDD.
map will not change the number of elements in an RDD, while mapPartitions might very well do so.
See also this answer and comments on a similar question.