Java Reference
In-Depth Information
Theorem 20.1
If independence of probes is assumed, the average number of cells examined in an
insertion using linear probing is 1/(1 - λ) .
For a table with a load factor of
.
Consequently, the expected number of independent trials required to find an empty
cell is 1/(1 - λ) .
λ
, the probability of any cell's being empty is 1 -
λ
Proof
In the proof of Theorem 20.1 we use the fact that, if the probability of
some event's occurring is p , then on average 1/ p trials are required until
the event occurs, provided that the trials are independent. For example, the
expected number of coin flips until a heads occurs is two, and the expected
number of rolls of a single six-sided die until a 4 occurs is six, assuming
independence.
20.3.2 what really happens: primary clustering
Unfortunately, independence does not hold, as shown in Figure 20.6. Part (a)
shows the result of filling a hash table to 70 percent capacity, if all successive
probes are independent. Part (b) shows the result of linear probing. Note the
group of clusters: the phenomenon known as primary clustering .
In primary clustering, large blocks of occupied cells are formed. Any key
that hashes into this cluster requires excessive attempts to resolve the colli-
sion, and then it adds to the size of the cluster. Not only do items that collide
because of identical hash functions cause degenerate performance, but also an
item that collides with an alternative location for another item causes poor
performance. The mathematical analysis required to take this phenomenon
into account is complex but has been solved, yielding Theorem 20.2.
The effect of pri-
mary clustering is
the formation of
large clusters of
occupied cells,
making insertions
into the cluster
expensive (and
then the insertion
makes the cluster
even larger).
The average number of cells examined in an insertion using linear probing is roughly
.
Theorem 20.2
(
111λ
+
(
-
)
2
)
2
Proof
The proof is beyond the scope of this text. See reference [6].
For a half-full table, we obtain 2.5 as the average number of cells exam-
ined during an insertion. This outcome is almost the same as what the naive
 
Search WWH ::




Custom Search