Java Reference
In-Depth Information
tableSize
is 10,000,
"junk"
would be indexed to 9,227. In Section 20.2 we dis-
cuss implementation of the hash function for strings in detail.
The use of the hash function introduces a complication: Two or more dif-
ferent items can hash out to the same position, causing a
collision.
This situa-
tion can never be avoided because there are many more items than positions.
However, many methods are available for quickly resolving a collision. We
investigate three of the simplest: linear probing, quadratic probing, and sepa-
rate chaining. Each method is simple to implement, but each yields a different
performance, depending on how full the array is.
Because the hash
function is not one
to one, several
items collide at the
same index and
cause a
collision
.
hash function
20.2
Computing the hash function for strings has a subtle complication: The con-
version of the
String s
to
x
generates an integer that is almost certainly larger
than the machine can store conveniently—because 128
4
= 2
28
. This integer
size is only a factor of 8 from the largest
int
. Consequently, we cannot expect
to compute the hash function by directly computing powers of 128. Instead,
we use the following observation. A general polynomial
A
3
X
3
+
A
2
X
2
+
A
1
X
1
+
A
0
X
0
(20.1)
can be evaluated as
(
(
A
()
XA
2
+
)
XA
1
+
)
XA
0
+
(20.2)
Note that in Equation 20.2, we avoid computation of the polynomial
directly, which is good for three reasons. First, it avoids a large intermediate
result, which, as we have shown, overflows. Second, the calculation in the
equation involves only three multiplications and three additions; an
N
-degree
polynomial is computed in
N
multiplications and additions. These opera-
tions compare favorably with the computation in Equation 20.1. Third, the
calculation proceeds left to right (
A
3
corresponds to
'j'
,
A
2
to
'u'
, and so
on, and
X
is 128).
However, an overflow problem persists: The result of the calculation is
still the same and is likely to be too large. But, we need only the result taken
mod
tableSize
. By applying the
%
operator after each multiplication (or addi-
tion), we can ensure that the intermediate results remain small.
1
The resulting
hash function is shown in Figure 20.1. An annoying feature of this hash func-
tion is that the mod computation is expensive. Because overflow is allowed
(and its results are consistent on a given platform), we can make the hash
By using a trick, we
can evaluate the
hash function effi-
ciently and without
overflow.
1. Section 7.4 contains the properties of the mod operation.
Search WWH ::
Custom Search