I have sort of a hard time understanding an implementation detail from java-9 ImmutableCollections.SetN; specifically why is there a need to increase the inner array twice.
Suppose you do this:
Set.of(1,2,3,4) // 4 elements, but internal array is 8
More exactly I perfectly understand why this is done (a double expansion) in case of a HashMap - where you never (almost) want the load_factor to be one. A value of !=1 improves search time as entries are better dispersed to buckets for example.
But in case of an immutable Set - I can't really tell. Especially since the way an index of the internal array is chosen.
Let me provide some details. First how the index is searched:
int idx = Math.floorMod(pe.hashCode() ^ SALT, elements.length);
pe is the actual value we put in the set. SALT is just 32 bits generated at start-up, once per JVM (this is the actual randomization if you want). elements.length for our example is 8 (4 elements, but 8 here - double the size).
This expression is like a negative-safe modulo operation. Notice that the same logical thing is done in HashMap for example ((n - 1) & hash) when the bucket is chosen.
So if elements.length is 8 for our case, then this expression will return any positive value that is less than 8 (0, 1, 2, 3, 4, 5, 6, 7).
Now the rest of the method:
while (true) {
E ee = elements[idx];
if (ee == null) {
return -idx - 1;
} else if (pe.equals(ee)) {
return idx;
} else if (++idx == elements.length) {
idx = 0;
}
}
Let's break it down:
if (ee == null) {
return -idx - 1;
This is good, it means that the current slot in the array is empty - we can put our value there.
} else if (pe.equals(ee)) {
return idx;
This is bad - slot is occupied and the already in place entry is equal to the one we want to put. Sets can't have duplicate elements - so an Exception is later thrown.
else if (++idx == elements.length) {
idx = 0;
}
This means that this slot is occupied (hash collision), but elements are not equal. In a HashMap this entry would be put to the same bucket as a LinkedNode or TreeNode - but not the case here.
So index is incremented and the next position is tried (with the small caveat that it moves in a circular way when it reaches the last position).
And here is the question: if nothing too fancy (unless I'm missing something) is being done when searching the index, why is there a need to have an array twice as big? Or why the function was not written like this:
int idx = Math.floorMod(pe.hashCode() ^ SALT, input.length);
// notice the diff elements.length (8) and not input.length (4)