Java : Iteration through a HashMap, which is more efficient?

Question

Given the following code, with two alternative ways to iterate through it,
is there any performance difference between these two methods?

        Map<String, Integer> map = new HashMap<String, Integer>();
        //populate map

        //alt. #1
        for (String key : map.keySet())
        {
            Integer value = map.get(key);
            //use key and value
        }

        //alt. #2
        for (Map.Entry<String, Integer> entry : map.entrySet())
        {
            String key = entry.getKey();
            Integer value = entry.getValue();
            //use key and value
        }

I am inclined to think that alt. #2 is the more efficient means of iterating through the entire map (but I could be wrong)

Just how big is the map? This smells like premature optimization. — Matt Ball, Apr 28 '11 at 23:55
@Matt I ask because I have several of them, and they are huge - usually to the tune of 10K-100K elements; there definitely is a good case for optimisation! — bguiz, Apr 28 '11 at 23:57
Update : Many answers appear to think this is premature optimisation. Do note that the above is indeed a SSCCE ( http://sscce.org/ ), and not the actual bit of code I am looking to optimise! — bguiz, Apr 29 '11 at 00:19
why is this premature optimization? The second options is clearly faster, but also clearly more self documenting. If all you wanted was the values, would you still use a keyset iterator? i would hope not. — MeBigFatGuy, Apr 29 '11 at 05:50
I think this becomes a more serious issue if the keys are large and/or have very involved .equals That would be interesting to benchmark: how involved have the .equals method to be so that the second option is at least twice as slow... — Marco Servetto, Jul 15 '17 at 21:13

Amol Katdare · Accepted Answer · 2022-05-03T21:38:04.050

Your second option is definitely more efficient since you are doing a lookup only once compared to n number of times in the first option.

But, nothing sticks better than trying it out when you can. So here goes -

(Not perfect but good enough to verify assumptions and on my machine anyway)

public static void main(String args[]) {

    Map<String, Integer> map = new HashMap<String, Integer>();
    // populate map

    int mapSize = 500000;
    int strLength = 5;
    for(int i=0;i<mapSize;i++)
        map.put(RandomStringUtils.random(strLength), RandomUtils.nextInt());
    
    long start = System.currentTimeMillis();
    // alt. #1
    for (String key : map.keySet()) {
        Integer value = map.get(key);
        // use key and value
    }
    System.out.println("Alt #1 took "+(System.currentTimeMillis()-start)+" ms");
    
    start = System.currentTimeMillis();
    // alt. #2
    for (Map.Entry<String, Integer> entry : map.entrySet()) {
        String key = entry.getKey();
        Integer value = entry.getValue();
        // use key and value
    }
    System.out.println("Alt #2 took "+(System.currentTimeMillis()-start)+" ms");
}

RESULTS (Some interesting ones)

With int mapSize = 5000; int strLength = 5;
Alt #1 took 26 ms
Alt #2 took 20 ms

With int mapSize = 50000; int strLength = 5;
Alt #1 took 32 ms
Alt #2 took 20 ms

With int mapSize = 50000; int strLength = 50;
Alt #1 took 22 ms
Alt #2 took 21 ms

With int mapSize = 50000; int strLength = 500;
Alt #1 took 28 ms
Alt #2 took 23 ms

With int mapSize = 500000; int strLength = 5;
Alt #1 took 92 ms
Alt #2 took 57 ms

...and so on

Please google for how to do a valid microbenchmark. (Key point: let hotspot do some warmup before the benchmark itself.) — Paŭlo Ebermann, Apr 29 '11 at 00:31
@Paulo - Fair enough and point noted. I retried using a warmup phase (essentially run the entire sequence once before running it again to measure) but the results are pretty much consistent. I guess it is because the put calls are warming things up anyway even without a warm up phase? — Amol Katdare, Apr 29 '11 at 00:38
+1 @amol : Thanks for the benchmarking/ solid evidence @Paulo : What particular standard would you recommend for a benchmarking? — bguiz, Apr 29 '11 at 01:49
This is quite old, but as someone asked about microbenchmarking, there'0s a thread discusing it: https://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java — DGoiko, Jan 31 '20 at 15:04

SLaks · Answer 2 · 2011-04-29T00:06:44.117

11

The second snippet will be slightly faster, since it doesn't need to re-look-up the keys.

All HashMap iterators call the nextEntry method, which returns an Entry<K,V>.

Your first snippet discards the value from the entry (in KeyIterator), then looks it up again in the dictionary.

Your second snippet uses the key and value directly (from EntryIterator)

(Both keySet() and entrySet() are cheap calls)

edited Apr 29 '11 at 00:06

answered Apr 29 '11 at 00:00

SLaks

868,454
176
1,908
1,964

score 10 · Answer 3 · answered Oct 01 '14 at 18:51

Map:

Map<String, Integer> map = new HashMap<String, Integer>();

Beside the 2 options, there is one more.

1) keySet() - use it if you need to use only the keys

for ( String k : map.keySet() ) {
    ...
}

2) entrySet() - use it if you need both: keys & values

for ( Map.Entry<String, Integer> entry : map.entrySet() ) {
    String k = entry.getKey();
    Integer v = entry.getValue();
    ...
}

3) values() - use it if you need only the values

for ( Integer v : map.values() ) {
    ...
}

Alex - GlassEditor.com · Answer 4 · 2018-09-20T22:27:47.700

The most efficient ways (according to my benchmark) are to use the new HashMap.forEach() method added in Java 8 or HashMap.entrySet().forEach().

JMH Benchmark:

@Param({"50", "500", "5000", "50000", "500000"})
int limit;
HashMap<String, Integer> m = new HashMap<>();
public Test() {
}
@Setup(Level.Trial)
public void setup(){
    m = new HashMap<>(m);
    for(int i = 0; i < limit; i++){
        m.put(i + "", i);
    }
}
int i;
@Benchmark
public int forEach(Blackhole b){
    i = 0;
    m.forEach((k, v) -> { i += k.length() + v; });
    return i;
}
@Benchmark
public int keys(Blackhole b){
    i = 0;
    for(String key : m.keySet()){ i += key.length() + m.get(key); }
    return i;
}
@Benchmark
public int entries(Blackhole b){
    i = 0;
    for (Map.Entry<String, Integer> entry : m.entrySet()){ i += entry.getKey().length() + entry.getValue(); }
    return i;
}
@Benchmark
public int keysForEach(Blackhole b){
    i = 0;
    m.keySet().forEach(key -> { i += key.length() + m.get(key); });
    return i;
}
@Benchmark
public int entriesForEach(Blackhole b){
    i = 0;
    m.entrySet().forEach(entry -> { i += entry.getKey().length() + entry.getValue(); });
    return i;
}
public static void main(String[] args) throws RunnerException {
    Options opt = new OptionsBuilder()
            .include(Test.class.getSimpleName())
            .forks(1)
            .warmupIterations(25)
            .measurementIterations(25)
            .measurementTime(TimeValue.milliseconds(1000))
            .warmupTime(TimeValue.milliseconds(1000))
            .timeUnit(TimeUnit.MICROSECONDS)
            .mode(Mode.AverageTime)
            .build();
    new Runner(opt).run();
}

Results:

Benchmark            (limit)  Mode  Cnt      Score    Error  Units
Test.entries              50  avgt   25      0.282 ±  0.037  us/op
Test.entries             500  avgt   25      2.792 ±  0.080  us/op
Test.entries            5000  avgt   25     29.986 ±  0.256  us/op
Test.entries           50000  avgt   25   1070.218 ±  5.230  us/op
Test.entries          500000  avgt   25   8625.096 ± 24.621  us/op
Test.entriesForEach       50  avgt   25      0.261 ±  0.008  us/op
Test.entriesForEach      500  avgt   25      2.891 ±  0.007  us/op
Test.entriesForEach     5000  avgt   25     31.667 ±  1.404  us/op
Test.entriesForEach    50000  avgt   25    664.416 ±  6.149  us/op
Test.entriesForEach   500000  avgt   25   5337.642 ± 91.186  us/op
Test.forEach              50  avgt   25      0.286 ±  0.001  us/op
Test.forEach             500  avgt   25      2.847 ±  0.009  us/op
Test.forEach            5000  avgt   25     30.923 ±  0.140  us/op
Test.forEach           50000  avgt   25    670.322 ±  7.532  us/op
Test.forEach          500000  avgt   25   5450.093 ± 62.384  us/op
Test.keys                 50  avgt   25      0.453 ±  0.003  us/op
Test.keys                500  avgt   25      5.045 ±  0.060  us/op
Test.keys               5000  avgt   25     58.485 ±  3.687  us/op
Test.keys              50000  avgt   25   1504.207 ± 87.955  us/op
Test.keys             500000  avgt   25  10452.425 ± 28.641  us/op
Test.keysForEach          50  avgt   25      0.567 ±  0.025  us/op
Test.keysForEach         500  avgt   25      5.743 ±  0.054  us/op
Test.keysForEach        5000  avgt   25     61.234 ±  0.171  us/op
Test.keysForEach       50000  avgt   25   1142.416 ±  3.494  us/op
Test.keysForEach      500000  avgt   25   8622.734 ± 40.842  us/op

As you can see, HashMap.forEach and HashMap.entrySet().forEach() perform best for large maps and are joined by the for loop on the entrySet() for best performance on small maps.

The reason the keys methods are slower is probably because they have to lookup the value again for each entry, while the other methods just need to read a field in an object they already have to get the value. The reason that I would expect the iterator methods to be slower is that they are doing external iteration, which requires two method calls (hasNext and next) for each element as well as storing the iteration state in the iterator object, while the internal iteration done by forEach requires just one method call to accept.

You should profile on your target hardware with your target data and doing your target action in the loops to get a more accurate result.

score 5 · Answer 5 · answered Apr 29 '11 at 00:04

5

The latter is more efficient than the former. A tool like FindBugs will actually flag the former and suggest you to do the latter.

answered Apr 29 '11 at 00:04

Jonas Kongslund

5,058
2
28
27

1

+1 @Jonas : Thanks for mentioning FindBugs - learn something new everyday! – bguiz Apr 29 '11 at 00:21

score 3 · Answer 6 · answered Apr 29 '11 at 00:29

In general, the second one would be a bit faster for a HashMap. It will only really matter if you have lots of hash collisions, since then the get(key) call gets slower than O(1) - it gets O(k) with k being the number of entries in the same bucket (i.e. the number of keys with same hash code or a different hash code which gets still mapped to the same bucket - this depends on the capacity, size and load factor of the map as well).

The Entry-iterating variant does not have to do the lookup, thus it gets a bit faster here.

Another note: If the capacity of your map is a lot bigger than the actual size and you use iterations a lot, you might consider using LinkedHashMap instead. It provides O(size) instead O(size+capacity) complexity for a complete iteration (as well as a predictable iteration order). (You should still measure if this really gives an improvement, since the factors might vary. LinkedHashMap has a bigger overhead for creating the map.)

score 2 · Answer 7 · edited Oct 01 '14 at 19:19

bguiz,

I think (I don't know) that iterating the EntrySet (alternative 2) is marginally more efficient, simply because it doesn't hash each key in order to get it's value... Having said that, calculating the hash is an O(1) operation per entry, and therefore we're ONLY talking O(n) over the whole HashMap... but note that all this applies to HashMap only... other implementations of Map may have VERY different performance characteristics.

I do think you'd be "pushing it" to actually NOTICE the difference in performance. If you are concerned then why not setup a test-case to time both iteration techniques?

If you don't have a REAL, reported performance issue, then you're really worrying about not very much... A few clock ticks here and there won't affect the overall usability of your program.

I believe that many, many other aspects of the code are typically more important than outright performance. Of course some blocks are "performance critical", and this is known BEFORE it's even written, let-alone performance tested... but such cases are fairly rare. As a general approach it's better to focus on writing complete, correct, flexible, testable, reusable, readable, maintainable code... performance CAN be built in later, as need arises.

Version 0 should be AS SIMPLE AS POSSIBLE, without any "optimizations".

Please note that this definitely isn't a case of premature optimisation, and the software is definitely not version zero. It's existing, mature software, in need need of performance improvements. In my question, I have posted is a SSCCE ( http://sscce.org/ ) — bguiz, Apr 29 '11 at 00:18
You shouldn't pay lookup costs, even cheap ones, when you can avoid them altogether. — Azeroth2b, Aug 26 '22 at 22:56

Java : Iteration through a HashMap, which is more efficient?

7 Answers7