If you're not familiar with the basic structure of hashtables and hashmaps, you may be wondering
what is meant by a fixed-size hashtable (particularly since the Java implementations of those
classes are not of fixed size).
Conceptually, a hashtable contains an array that can hold some number of entries (each element in
the array is called a bucket ). When something is stored in a hashtable, it is stored in the array pos-
ition determined by the object's hash value modulus the number of buckets. It is quite possible for
two objects with different hash values to map to the same bucket in this scheme, so each bucket is
really a linked list of all the stored items that map to that bucket. When two objects map to the
same bucket, it is called a collision.
As more and more objects are inserted into this table, there are more and more collisions; more
items get stored into each linked list. Finding an item then becomes a matter of searching through
a linked list. That can be very slow, particularly as the list gets longer.
The way around this is to size the hashtable so that it has more buckets (and, as a result, fewer
collisions). Many implementations do that dynamically; in fact, that is the way the Java Hasht-
able and HashMap classes work.
But other implementations—like the one internal to the JVM being discussed here—cannot resize
themselves; the size of their array is fixed when the map is created.
Starting in Java 7, the size of this table can be set when the JVM starts by using the flag -
XX:StringTableSize= N (which defaults to 1,009 or 60,013 as previously mentioned). If an
application will intern a lot of strings, this number should be increased. The string intern
table will operate most efficiently if that value is a prime number.
The performance of the intern() method is dominated by how well the string table size is
tuned. As an example, Table 7-3 shows the total time to create and intern 10 million ran-
domly created strings under various scenarios: