Java Reference
In-Depth Information
insertions and deletions. After the table is loaded with the initial collection of
records, the first few deletions will lengthen the average probe sequence distance
for records (it will add tombstones). Over time, the average distance will reach
an equilibrium point because insertions will tend to decrease the average distance
by filling in tombstone slots. For example, after initially loading records into the
database, the average path distance might be 1.2 (i.e., an average of 0.2 accesses
per search beyond the home position will be required). After a series of insertions
and deletions, this average distance might increase to 1.6 due to tombstones. This
seems like a small increase, but it is three times longer on average beyond the home
position than before deletions.
Two possible solutions to this problem are
1. Do a local reorganization upon deletion to try to shorten the average path
length. For example, after deleting a key, continue to follow the probe se-
quence of that key and swap records further down the probe sequence into
the slot of the recently deleted record (being careful not to remove any key
from its probe sequence). This will not work for all collision resolution poli-
cies.
2. Periodically rehash the table by reinserting all records into a new hash table.
Not only will this remove the tombstones, but it also provides an opportunity
to place the most frequently accessed records into their home positions.
9.5
Further Reading
For a comparison of the efficiencies for various self-organizing techniques, see
Bentley and McGeoch, “Amortized Analysis of Self-Organizing Sequential Search
Heuristics” [BM85]. The text compression example of Section 9.2 comes from
Bentley et al., “A Locally Adaptive Data Compression Scheme” [BSTW86]. For
more on Ziv-Lempel coding, see Data Compression: Methods and Theory by
James A. Storer [Sto88]. Knuth covers self-organizing lists and Zipf distributions
in Volume 3 of The Art of Computer Programming[Knu98].
Introduction to Modern Information Retrieval by Salton and McGill [SM83] is
an excellent source for more information about document retrieval techniques.
See the paper “Practical Minimal Perfect Hash Functions for Large Databases”
by Fox et al. [FHCD92] for an introduction and a good algorithm for perfect hash-
ing.
For further details on the analysis for various collision resolution policies, see
Knuth, Volume 3 [Knu98] and Concrete Mathematics: A Foundation for Computer
Science by Graham, Knuth, and Patashnik [GKP94].
The model of hashing presented in this chapter has been of a fixed-size hash
table. A problem not addressed is what to do when the hash table gets half full and
more records must be inserted. This is the domain of dynamic hashing methods.
Search WWH ::




Custom Search