Java Reference
In-Depth Information
it will produce a lot of collisions if the search keys contain the same letters, such as
tod
and
dot
.
A better approach is to generate a hash code that takes the position of characters into con-
sideration. Specifically, let the hash code be
s
0
*
b
(
n
-
1)
s
1
*
b
(
n
-
2)
s
n
-
1
where
s
i
is
s.charAt(i)
. This expression is a polynomial for some positive
b
, so this is called
a
polynomial hash code
. Using Horner's rule for polynomial evaluation (see Section 6.7), the
hash code can be calculated efficiently as follows:
+
+ c +
polynomial hash code
(
c
((
s
0
*
b
+
s
1
)
b
+
s
2
)
b
+ c +
s
n
-
2
)
b
+
s
n
-
1
This computation can cause an overflow for long strings, but arithmetic overflow is ignored
in Java. You should choose an appropriate value
b
to minimize collisions. Experiments show
that good choices for
b
are 31, 33, 37, 39, and 41. In the
String
class, the
hashCode
is over-
ridden using the polynomial hash code with
b
being
31
.
27.3.3 Compressing Hash Codes
The hash code for a key can be a large integer that is out of the range for the hash-table index,
so you need to scale it down to fit in the index's range. Assume the index for a hash table is
between
0
and
N-1
. The most common way to scale an integer to between
0
and
N-1
is to use
h(hashCode) = hashCode % N
To ensure that the indices are spread evenly, choose
N
to be a prime number greater than
2
.
Ideally, you should choose a prime number for
N
. However, it is time consuming to find
a large prime number. In the Java API implementation for
java.util.HashMap
,
N
is set
to a value of the power of
2
. There is a good reason for this choice. When
N
is a value of the
power of
2
,
h(hashCode) = hashCode % N
is the same as
h(hashCode) = hashCode & (N -
1
)
The ampersand,
&
, is a bitwise AND operator (see Appendix G, Bitwise Operations). The
AND of two corresponding bits yields a
1
if both bits are
1
. For example, assume
N = 4
and
hashCode = 11
,
11 % 4 = 3
, which is the same as
01011 & 00011 = 11
. The
&
operator
can be performed much faster than the
%
operator.
To ensure that the hashing is evenly distributed, a supplemental hash function is also used
along with the primary hash function in the implementation of
java.util.HashMap
. This
function is defined as:
private static int
supplementalHash(
int
h) {
h ^= (h >>>
20
) ^ (h >>>
12
);
return
h ^ (h >>>
7
) ^ (h >>>
4
);
}
^
and
>>>
are bitwise exclusive-or and unsigned right-shift operations (also introduced in
Appendix G). The bitwise operations are much faster than the multiplication, division, and
remainder operations. You should replace these operations with the bitwise operations when-
ever possible.
The complete hash function is defined as:
h(hashCode) = supplementalHash(hashCode) % N
Search WWH ::
Custom Search