Searching - Data Structures and Algorithm Analysis

Java Reference

In-Depth Information

is p(K;i) = (i 2 + i)=2, then every slot in the table will be visited by the probe

function.

Both pseudo-random probing and quadratic probing eliminate primary cluster-

ing, which is the problem of keys sharing substantial segments of a probe sequence.

If two keys hash to the same home position, however, then they will always follow

the same probe sequence for every collision resolution method that we have seen so

far. The probe sequences generated by pseudo-random and quadratic probing (for

example) are entirely a function of the home position, not the original key value.

This is because function p ignores its input parameter K for these collision resolu-

tion methods. If the hash function generates a cluster at a particular home position,

then the cluster remains under pseudo-random and quadratic probing. This problem

is called secondary clustering.

To avoid secondary clustering, we need to have the probe sequence make use of

the original key value in its decision-making process. A simple technique for doing

this is to return to linear probing by a constant step size for the probe function, but

to have that constant be determined by a second hash function, h 2 . Thus, the probe

sequence would be of the form p(K;i) = ih 2 (K). This method is called double

hashing.

Example9.11 Assume a hash table has size M = 101, and that there

are three keys k 1 , k 2 , and k 3 with h(k 1 ) = 30, h(k 2 ) = 28, h(k 3 ) = 30,

h 2 (k 1 ) = 2, h 2 (k 2 ) = 5, and h 2 (k 3 ) = 5. Then, the probe sequence

for k 1 will be 30, 32, 34, 36, and so on. The probe sequence for k 2 will

be 28, 33, 38, 43, and so on. The probe sequence for k 3 will be 30, 35,

40, 45, and so on. Thus, none of the keys share substantial portions of the

same probe sequence. Of course, if a fourth key k 4 has h(k 4 ) = 28 and

h 2 (k 4 ) = 2, then it will follow the same probe sequence as k 1 . Pseudo-

random or quadratic probing can be combined with double hashing to solve

this problem.

A good implementation of double hashing should ensure that all of the probe

sequence constants are relatively prime to the table size M. This can be achieved

easily. One way is to select M to be a prime number, and have h 2 return a value in

the range 1 h 2 (K) M 1. Another way is to set M = 2 m for some value m

and have h 2 return an odd value between 1 and 2 m .

Figure 9.9 shows an implementation of the dictionary ADT by means of a hash

table. The simplest hash function is used, with collision resolution by linear prob-

ing, as the basis for the structure of a hash table implementation. A suggested

project at the end of this chapter asks you to improve the implementation with

other hash functions and collision resolution policies.

Search WWH ::

Custom Search

Home