I have a huge array (say, ParticleId[]) of unique integers (representing particle IDs) stored in random order in memory. I need to build a hash table to map each ID to its location inside the array, i.e., from ID to index. The IDs are not necessarily continuous integers so a simple look-up array is not a good solution.
I am currently using the unordered_map container of c++11 to achieve this. The map is initialized with a loop:
unordered_map <ParticleId_t, ParticleIndex_t> ParticleHash;
ParticleHash.rehash(NumberOfParticles);
ParticleHash.reserve(NumberOfParticles);
for(ParticleIndex_t i=0;i<NumberOfParticles;i++)
ParticleHash[ParticleId[i]]=i;
The ParticleId_t and ParticleIndex_t are simply typedef-ed integers.
NumberOfParticles can be huge (e.g., 1e9). ParticleId[] array and NumberOfParticles are const as far as the hash table is concerned.
Currently it takes quite a lot of time to build the unordered_map as above. My questions are:
- Is
unordered_mapthe optimal choice for this problem?- would
mapbe faster to initialize, although it may not be as efficient in the look-up?
- would
- Is it possible to speed up the initialization?
- Is it much faster to use
ParticleHash.insert()thanParticleHash[]=? or any other functions? - Given that my keys are known to be unique integers, is there a way to optimize the map as well as the insertion?
- Is it much faster to use
I am considering the intel concurrent_unordered_map to parallelize it. However, that would introduce a dependence on the intel TBB library, which I would like to avoid if possible. Is there an easy solution using the native STL containers?
Update:
Now I've reverted to a plain sorted index table and rely on bsearch for lookup. At least the initialization of the table is now 20 times faster and can be easily parallelized.