Java Reference
In-Depth Information
The cost for a successful search (or a deletion) has the same cost as originally
inserting that record. However, the expected value for the insertion cost depends
on the value of not at the time of deletion, but rather at the time of the original
insertion. We can derive an estimate of this cost (essentially an average over all the
insertion costs) by integrating from 0 to the current value of , yielding a result of
Z
1
1
1 x dx =
1
log e
1
1 :
0
It is important to realize that these equations represent the expected cost for
operations using the unrealistic assumption that the probe sequence is based on a
random permutation of the slots in the hash table (thus avoiding all expense result-
ing from clustering). Thus, these costs are lower-bound estimates in the average
case. The true average cost under linear probing is 1 2 (1 + 1=(1) 2 ) for insertions
or unsuccessful searches and 1 2 (1 + 1=(1)) for deletions or successful searches.
Proofs for these results can be found in the references cited in Section 9.5.
Figure 9.10 shows the graphs of these four equations to help you visualize the
expected performance of hashing based on the load factor. The two solid lines show
the costs in the case of a “random” probe sequence for (1) insertion or unsuccessful
search and (2) deletion or successful search. As expected, the cost for insertion or
unsuccessful search grows faster, because these operations typically search further
down the probe sequence. The two dashed lines show equivalent costs for linear
probing. As expected, the cost of linear probing grows faster than the cost for
“random” probing.
From Figure 9.10 we see that the cost for hashing when the table is not too full
is typically close to one record access. This is extraordinarily efficient, much better
than binary search which requires log n record accesses. As increases, so does
the expected cost. For small values of , the expected cost is low. It remains below
two until the hash table is about half full. When the table is nearly empty, adding
a new record to the table does not increase the cost of future search operations
by much. However, the additional search cost caused by each additional insertion
increases rapidly once the table becomes half full. Based on this analysis, the rule
of thumb is to design a hashing system so that the hash table never gets above half
full. Beyond that point performance will degrade rapidly. This requires that the
implementor have some idea of how many records are likely to be in the table at
maximum loading, and select the table size accordingly.
You might notice that a recommendation to never let a hash table become more
than half full contradicts the disk-based space/time tradeoff principle, which strives
to minimize disk space to increase information density. Hashing represents an un-
usual situation in that there is no benefit to be expected from locality of reference.
In a sense, the hashing system implementor does everything possible to eliminate
the effects of locality of reference! Given the disk block containing the last record
 
Search WWH ::




Custom Search