Deduplicating common strings is usually a good idea to save memory.
But never use String.intern for deduplication!
- String.internis a native method; each call suffers from additional JNI overhead.
- It blows internal hashtable which is shared among all JVM parts (e.g. class loading).
- The default capacity of string table is not large enough, and the number of buckets is constant.
- It may increase GC pauses since JVM scans this internal hashtable and possibly rehashes it during stop-the-world phase.
- More details in this presentation.
A regular HashMap or ConcurrentHashMap can be a on order of magnitude better for this task.
The following benchmark compares the performance of String.intern to [Concurrent]HashMap.putIfAbsent on the set of 1M strings:
@State(Scope.Benchmark)
public class Dedup {
    private static final HashMap<String, String> HM = new HashMap<>();
    private static final ConcurrentHashMap<String, String> CHM = new ConcurrentHashMap<>();
    private static final int SIZE = 1024 * 1024;
    private static final String[] STRINGS = new Random(0).ints(SIZE)
            .mapToObj(Integer::toString)
            .toArray(String[]::new);
    int idx;
    @Benchmark
    public String intern() {
        String s = nextString();
        return s.intern();
    }
    @Benchmark
    public String hashMap() {
        String s = nextString();
        String prev = HM.putIfAbsent(s, s);
        return prev != null ? prev : s;
    }
    @Benchmark
    public String concurrentHashMap() {
        String s = nextString();
        String prev = CHM.putIfAbsent(s, s);
        return prev != null ? prev : s;
    }
    private String nextString() {
        return STRINGS[++idx & (SIZE - 1)];
    }
}
The results on JDK 9 (smaller is better):
Benchmark                Mode  Cnt    Score    Error  Units
Dedup.concurrentHashMap  avgt   10   91,208 ±  0,569  ns/op
Dedup.hashMap            avgt   10   73,917 ±  0,602  ns/op
Dedup.intern             avgt   10  832,700 ± 73,402  ns/op