Given that the HashPartitioner docs say:
[HashPartitioner] implements hash-based partitioning using Java's Object.hashCode.
Say I want to partition DeviceData by its kind.
case class DeviceData(kind: String, time: Long, data: String)
Would it be correct to partition an RDD[DeviceData] by overwriting the deviceData.hashCode() method and use only the hashcode of kind?
But given that HashPartitioner takes a number of partitions parameter I am confused as to whether I need to know the number of kinds in advance and what happens if there are more kinds than partitions?
Is it correct that if I write partitioned data to disk it will stay partitioned when read?
My goal is to call
deviceDataRdd.foreachPartition(d: Iterator[DeviceData] => ...)
And have only DeviceData's of the same kind value in the iterator.