Java Reference
In-Depth Information
for some positive constant
g
. This expression is a polynomial in
g
. To minimize the number of
arithmetic operations, write the polynomial in the following algebraically equivalent form:
(…((
u
0
g
+
u
1
)
g
+
u
2
)
g
+
…
+
u
n
-
2
)
g
+
u
n
-
1
This way of evaluating a polynomial is called
Horner's method.
The following Java statements perform this computation for the string
s
and the
int
constant
g
:
int
hash = 0;
int
n = s.length();
for
(
int
i = 0; i < n; i++)
hash = g * hash + s.charAt(i);
The
i
th
character of the string is
s.charAt(i)
. Adding this character to the product
g
*
hash
actu-
ally adds the character's Unicode value. An explicit cast of
s.charAt(i)
to
int
is not necessary
and would not affect the result.
This computation can cause an overflow, particularly for long strings. Java ignores these
overflows and, for an appropriate choice of
g
, the result will be a reasonable hash code. Current
implementations of the method
hashCode
in Java's class
String
use this computation with 31 as
the value of
g
. Realize, however, that the overflows can produce a negative result. You can deal
with that when you compress the hash code into an appropriate index for the hash table.
Question 1
Calculate the hash code for the string
Java
when
g
is 31. Compare your result
with the value of the expression
"
Java
"
.hashCode()
.
21.9
The hash code for a primitive type.
This segment contains Java operations that might be unfamil-
iar to you. However, they are not essential to the rest of this chapter.
If the search key's data type is
int
, you can use the key itself as the hash code. If the search
key is an instance of either
byte
,
short
, or
char
, you can cast it to an
int
to get a hash code.
Thus, casting to an
int
is one way to generate a hash code.
For other primitive types, you manipulate their internal binary representations. If the search
key is an integer of type
long
, it contains 64 bits. An
int
has 32 bits. Simply casting the 64-bit
search key to an
int
—or performing a modulo 2
32
—would lose its first 32 bits. As a result, all
keys that differ in only their first 32 bits will have the same hash code and collide. For this rea-
son, ignoring part of a search key can be a problem.
Note:
Derive the hash code from the
entire
search key. Do not ignore part of it.
Instead of ignoring a part of a
long
search key, divide it into several pieces. Then combine
the pieces by using either addition or a bit-wise boolean operation such as
exclusive or
. This pro-
cess is called
folding
.
For example, let's divide a
long
search key into two 32-bit halves. To get the left half, we
can shift the search key to the right by a certain number of bits, or places. For example, if we
shift the 8-bit binary number 10101100 to the right by 4 bits, we will get 00001010. We have iso-
lated the number's left half and discarded its right half. If we now combine 00001010 with the
original value and ignore the left half of the result, we will effectively have combined the left and
right halves of the original key.