Hash tables - Data Structures and Problem Solving Using Java

Java Reference

In-Depth Information

tableSize is 10,000, "junk" would be indexed to 9,227. In Section 20.2 we dis-

cuss implementation of the hash function for strings in detail.

The use of the hash function introduces a complication: Two or more dif-

ferent items can hash out to the same position, causing a collision. This situa-

tion can never be avoided because there are many more items than positions.

However, many methods are available for quickly resolving a collision. We

investigate three of the simplest: linear probing, quadratic probing, and sepa-

rate chaining. Each method is simple to implement, but each yields a different

performance, depending on how full the array is.

Because the hash

function is not one

to one, several

items collide at the

same index and

cause a collision .

hash function

20.2

Computing the hash function for strings has a subtle complication: The con-

version of the String s to x generates an integer that is almost certainly larger

than the machine can store conveniently—because 128 4 = 2 28 . This integer

size is only a factor of 8 from the largest int . Consequently, we cannot expect

to compute the hash function by directly computing powers of 128. Instead,

we use the following observation. A general polynomial

A 3 X 3

+

A 2 X 2

+

A 1 X 1

+

A 0 X 0

(20.1)

can be evaluated as

(

A ()

XA 2

+

)

XA 1

+

)

XA 0

+

(20.2)

Note that in Equation 20.2, we avoid computation of the polynomial

directly, which is good for three reasons. First, it avoids a large intermediate

result, which, as we have shown, overflows. Second, the calculation in the

equation involves only three multiplications and three additions; an N -degree

polynomial is computed in N multiplications and additions. These opera-

tions compare favorably with the computation in Equation 20.1. Third, the

calculation proceeds left to right ( A 3 corresponds to 'j' , A 2 to 'u' , and so

on, and X is 128).

However, an overflow problem persists: The result of the calculation is

still the same and is likely to be too large. But, we need only the result taken

mod tableSize . By applying the % operator after each multiplication (or addi-

tion), we can ensure that the intermediate results remain small. 1 The resulting

hash function is shown in Figure 20.1. An annoying feature of this hash func-

tion is that the mod computation is expensive. Because overflow is allowed

(and its results are consistent on a given platform), we can make the hash

By using a trick, we

can evaluate the

hash function effi-

ciently and without

overflow.

1. Section 7.4 contains the properties of the mod operation.

Data Structures and Problem Solving Using Java

Search WWH ::

Custom Search

Home