Java Reference
In-Depth Information
tableSize is 10,000, "junk" would be indexed to 9,227. In Section 20.2 we dis-
cuss implementation of the hash function for strings in detail.
The use of the hash function introduces a complication: Two or more dif-
ferent items can hash out to the same position, causing a collision. This situa-
tion can never be avoided because there are many more items than positions.
However, many methods are available for quickly resolving a collision. We
investigate three of the simplest: linear probing, quadratic probing, and sepa-
rate chaining. Each method is simple to implement, but each yields a different
performance, depending on how full the array is.
Because the hash
function is not one
to one, several
items collide at the
same index and
cause a collision .
hash function
20.2
Computing the hash function for strings has a subtle complication: The con-
version of the String s to x generates an integer that is almost certainly larger
than the machine can store conveniently—because 128 4 = 2 28 . This integer
size is only a factor of 8 from the largest int . Consequently, we cannot expect
to compute the hash function by directly computing powers of 128. Instead,
we use the following observation. A general polynomial
A 3 X 3
+
A 2 X 2
+
A 1 X 1
+
A 0 X 0
(20.1)
can be evaluated as
(
(
A ()
XA 2
+
)
XA 1
+
)
XA 0
+
(20.2)
Note that in Equation 20.2, we avoid computation of the polynomial
directly, which is good for three reasons. First, it avoids a large intermediate
result, which, as we have shown, overflows. Second, the calculation in the
equation involves only three multiplications and three additions; an N -degree
polynomial is computed in N multiplications and additions. These opera-
tions compare favorably with the computation in Equation 20.1. Third, the
calculation proceeds left to right ( A 3 corresponds to 'j' , A 2 to 'u' , and so
on, and X is 128).
However, an overflow problem persists: The result of the calculation is
still the same and is likely to be too large. But, we need only the result taken
mod tableSize . By applying the % operator after each multiplication (or addi-
tion), we can ensure that the intermediate results remain small. 1 The resulting
hash function is shown in Figure 20.1. An annoying feature of this hash func-
tion is that the mod computation is expensive. Because overflow is allowed
(and its results are consistent on a given platform), we can make the hash
By using a trick, we
can evaluate the
hash function effi-
ciently and without
overflow.
1. Section 7.4 contains the properties of the mod operation.
 
 
 
Search WWH ::




Custom Search