Java Reference
In-Depth Information
there are 3,035,072 interned strings (since there are 1,009 buckets with an average of 3,008
strings per bucket). Ideally, the average bucket size should be 0 or 1. The size won't ever ac-
tually be 0—just less than 0.5, but the calculation is done using integer arithmetic, so it
might be rounded down in the report. If the averages are larger than 1, increase the string
table size.
The number of interned strings an application has allocated (and their total size) can also be
obtained using the jmap command (this also requires JDK 7u6 or later):
% jmap -heap process_id
... other output ...
36361 interned Strings occupying 3247040 bytes.
The penalty for setting the size of the string table too high is minimal: each bucket takes only
4 or 8 bytes (depending on whether you have a 32- or 64-bit JVM), so having a few thousand
more entries than optimal is a one-time cost of a few kilobytes of native (not heap) memory.
On the topic of interning strings, what about using the intern() method to make the program run
faster, since interned strings can be compared via the == operator? That is a popular thought,
though in most cases it turns out to be a myth. The String.equals() method is pretty fast. It
starts by knowing that unequal-length strings are never equal, though if the strings have equal
length, it must scan the string and compare all the characters (at least until it finds that the strings
do not match). Comparing strings via the == operation is undeniably faster, but the cost of intern-
ing the string must also be taken into consideration. That requires (among other things) calculat-
ing the string's hashcode, which means scanning the entire string and performing an operation on
each of its characters (just as the equals() method must do).
The only time a benefit in string comparison can be expected from using the intern() method is
if an application performs a lot of repeated comparisons on a set of strings of the same length. If
both strings have been previously interned, then the == comparison is faster; the cost of calling
the intern() method needn't be counted more than once. But in the general case, the costs are
mostly the same.
Search WWH ::

Custom Search