Hash tables - Data Structures and Problem Solving Using Java

Java Reference

In-Depth Information

The find algorithm merely follows the same path as the insert algo-

rithm. If it reaches an empty slot, the item we are searching for is not found;

otherwise, it finds the match eventually. For example, to find 58, we start at

slot 8 (as indicated by the hash function). We see an item, but it is the wrong

one, so we try slot 9. Again, we have an item, but it is the wrong one, so we

try slot 0 and then slot 1 until we find a match. A find for 19 would involve

trying slots 9, 0, 1, and 2 before finding the empty cell in slot 3. Thus 19 is

not found.

The find algorithm

follows the same

probe sequence

as the insert

algorithm.

Standard deletion cannot be performed because, as with a binary search

tree, an item in the hash table not only represents itself, but it also connects

other items by serving as a placeholder during collision resolution. Thus, if

we removed 89 from the hash table, virtually all the remaining find opera-

tions would fail. Consequently, we implement lazy deletion, or marking

items as deleted rather than physically removing them from the table. This

information is recorded in an extra data member. Each item is either active

or deleted .

We must use lazy

deletion.

20.3.1 naive analysis of linear probing

To estimate the performance of linear probing, we make two assumptions:

The simplistic anal-

ysis of linear prob-

ing is based on the

assumption that

successive probes

are independent.

This assumption is

not true and thus

the analysis under-

estimates the costs

of searching and

insertion.

1.

The hash table is large

2.

Each probe in the hash table is independent of the previous probe.

Assumption 1 is reasonable; otherwise, we would not be bothering with a

hash table. Assumption 2 says that, if the fraction of the table that is full

is

λ

, each time we examine a cell the probability that it is occupied is also

λ

, independent of any previous probes. Independence is an important sta-

tistical property that greatly simplifies the analysis of random events.

Unfortunately, as discussed in Section 20.3.2, the assumption of indepen-

dence is not only unjustified, but it also is erroneous. Thus the naive anal-

ysis that we perform is incorrect. Even so, it is helpful because it tells us

what we can hope to achieve if we are more careful about how collisions

are resolved. As mentioned earlier in the chapter, the performance of the

hash table depends on how full the table is. Its fullness is given by the

load factor.

The load factor of a

probing hash table

is the fraction of

the table that is full.

It ranges from 0

(empty) to 1 (full).

definition: The load factor,

, of a probing hash table is the fraction of the table

that is full. The load factor ranges from 0 (empty) to 1 (completely full).

λ

We can now give a simple but incorrect analysis of linear probing in

Theorem 20.1.

Data Structures and Problem Solving Using Java

Search WWH ::

Custom Search

Home