Java Reference
In-Depth Information
it will produce a lot of collisions if the search keys contain the same letters, such as tod and
dot .
A better approach is to generate a hash code that takes the position of characters into con-
sideration. Specifically, let the hash code be
s 0 * b ( n - 1)
s 1 * b ( n - 2)
s n - 1
where s i is s.charAt(i) . This expression is a polynomial for some positive b , so this is called
a polynomial hash code . Using Horner's rule for polynomial evaluation (see Section 6.7), the
hash code can be calculated efficiently as follows:
+
+ c +
polynomial hash code
( c (( s 0 * b
+
s 1 ) b
+
s 2 ) b
+ c +
s n - 2 ) b
+
s n - 1
This computation can cause an overflow for long strings, but arithmetic overflow is ignored
in Java. You should choose an appropriate value b to minimize collisions. Experiments show
that good choices for b are 31, 33, 37, 39, and 41. In the String class, the hashCode is over-
ridden using the polynomial hash code with b being 31 .
27.3.3 Compressing Hash Codes
The hash code for a key can be a large integer that is out of the range for the hash-table index,
so you need to scale it down to fit in the index's range. Assume the index for a hash table is
between 0 and N-1 . The most common way to scale an integer to between 0 and N-1 is to use
h(hashCode) = hashCode % N
To ensure that the indices are spread evenly, choose N to be a prime number greater than 2 .
Ideally, you should choose a prime number for N . However, it is time consuming to find
a large prime number. In the Java API implementation for java.util.HashMap , N is set
to a value of the power of 2 . There is a good reason for this choice. When N is a value of the
power of 2 ,
h(hashCode) = hashCode % N
is the same as
h(hashCode) = hashCode & (N - 1 )
The ampersand, & , is a bitwise AND operator (see Appendix G, Bitwise Operations). The
AND of two corresponding bits yields a 1 if both bits are 1 . For example, assume N = 4 and
hashCode = 11 , 11 % 4 = 3 , which is the same as 01011 & 00011 = 11 . The & operator
can be performed much faster than the % operator.
To ensure that the hashing is evenly distributed, a supplemental hash function is also used
along with the primary hash function in the implementation of java.util.HashMap . This
function is defined as:
private static int supplementalHash( int h) {
h ^= (h >>> 20 ) ^ (h >>> 12 );
return h ^ (h >>> 7 ) ^ (h >>> 4 );
}
^ and >>> are bitwise exclusive-or and unsigned right-shift operations (also introduced in
Appendix G). The bitwise operations are much faster than the multiplication, division, and
remainder operations. You should replace these operations with the bitwise operations when-
ever possible.
The complete hash function is defined as:
h(hashCode) = supplementalHash(hashCode) % N
 
 
Search WWH ::




Custom Search