Hash tables - Data Structures and Problem Solving Using Java

Java Reference

In-Depth Information

20.4.2 analysis of quadratic probing

Quadratic probing

is implemented in

findPos . It uses the

previously

described trick to

avoid multiplica-

tions and mods.

Quadratic probing has not yet been mathematically analyzed, although we

know that it eliminates primary clustering. In quadratic probing, elements that

hash to the same position probe the same alternative cells, which is known as

secondary clustering. Again, the independence of successive probes cannot be

assumed. Secondary clustering is a slight theoretical blemish. Simulation

results suggest that it generally causes less than an extra one-half probe per

search and that this increase is true only for high load factors. Figure 20.6

illustrates the difference between linear probing and quadratic probing and

shows that quadratic probing does not suffer from as much clustering as does

linear probing.

Techniques that eliminate secondary clustering are available. The most

popular is double hashing, in which a second hash function is used to drive

the collision resolution. Specifically, we probe at a distance Hash 2 ( X ),

2 Hash 2 ( X ), and so on. The second hash function must be carefully chosen

(e.g., it should never evaluate to 0), and all cells must be capable of being

probed. A function such as Hash 2 ( X ) = R - ( X mod R ), with R a prime smaller

than M, generally works well. Double hashing is theoretically interesting

because it can be shown to use essentially the same number of probes as the

purely random analysis of linear probing would imply. However, it is some-

what more complicated than quadratic probing to implement and requires

careful attention to some details.

There seems to be no good reason not to use a quadratic probing strategy,

unless the overhead of maintaining a half-empty table is burdensome. That

would be the case in other programming languages if the items being stored

were very large.

In secondary clus-

tering, elements

that hash to the

same position

probe the same

alternative cells.

Secondary cluster-

ing is a minor theo-

retical blemish.

Double hashing is a

hashing technique

that does not suf-

fer from secondary

clustering. A sec-

ond hash function

is used to drive the

collision resolution.

separate chaining hashing

20.5

A popular and space-efficient alternative to quadratic probing is separate chain-

ing hashing in which an array of linked lists is maintained. For an array of

linked lists, L 0 , L 1 , ..., L M - 1 , the hash function tells us in which list to insert an

item X and then, during a find , which list contains X . The idea is that, although

searching a linked list is a linear operation, if the lists are sufficiently short, the

search time will be very fast. In particular, suppose that the load factor, N/M, is

Separate chaining

hashing is a space-

efficient alternative

to quadratic prob-

ing in which an

array of linked lists

is maintained. It is

less sensitive to

high load factors.

, making the

expected number of probes for an insertion or unsuccessful search

, which is not bounded by 1.0. Thus the average list has length

and the

expected number of probes for a successful search . The reason is that

a successful search must occur in a nonempty list, and in such a list we expect to

have to traverse halfway down the list. The relative cost of a successful search

⁄

Data Structures and Problem Solving Using Java

Search WWH ::

Custom Search

Home