Java Reference
In-Depth Information
The
find
algorithm merely follows the same path as the
insert
algo-
rithm. If it reaches an empty slot, the item we are searching for is not found;
otherwise, it finds the match eventually. For example, to find 58, we start at
slot 8 (as indicated by the hash function). We see an item, but it is the wrong
one, so we try slot 9. Again, we have an item, but it is the wrong one, so we
try slot 0 and then slot 1 until we find a match. A
find
for 19 would involve
trying slots 9, 0, 1, and 2 before finding the empty cell in slot 3. Thus 19 is
not found.
The
find
algorithm
follows the same
probe sequence
as the
insert
algorithm.
Standard deletion cannot be performed because, as with a binary search
tree, an item in the hash table not only represents itself, but it also connects
other items by serving as a placeholder during collision resolution. Thus, if
we removed 89 from the hash table, virtually all the remaining
find
opera-
tions would fail. Consequently, we implement
lazy deletion,
or marking
items as deleted rather than physically removing them from the table. This
information is recorded in an extra data member. Each item is either
active
or
deleted
.
We must use
lazy
deletion.
20.3.1
naive analysis of linear probing
To estimate the performance of linear probing, we make two assumptions:
The simplistic anal-
ysis of linear prob-
ing is based on the
assumption that
successive probes
are independent.
This assumption is
not true and thus
the analysis under-
estimates the costs
of searching and
insertion.
1.
The hash table is large
2.
Each probe in the hash table is independent of the previous probe.
Assumption 1 is reasonable; otherwise, we would not be bothering with a
hash table. Assumption 2 says that, if the fraction of the table that is full
is
λ
,
each time we examine a cell the probability that it is occupied is also
λ
,
independent of any previous probes. Independence is an important sta-
tistical property that greatly simplifies the analysis of random events.
Unfortunately, as discussed in Section 20.3.2, the assumption of indepen-
dence is not only unjustified, but it also is erroneous. Thus the naive anal-
ysis that we perform is incorrect. Even so, it is helpful because it tells us
what we can hope to achieve if we are more careful about how collisions
are resolved. As mentioned earlier in the chapter, the performance of the
hash table depends on how full the table is. Its fullness is given by the
load factor.
The
load factor
of a
probing hash table
is the fraction of
the table that is full.
It ranges from 0
(empty) to 1 (full).
definition:
The
load factor,
,
of a probing hash table is the fraction of the table
that is full. The load factor ranges from 0 (empty) to 1 (completely full).
λ
We can now give a simple but incorrect analysis of linear probing in
Theorem 20.1.
Search WWH ::
Custom Search