Java Reference
In-Depth Information
5
Insert
Delete
4
3
2
1
0 .2 .4 .6 .8 1.0
Figure9.10 Growth of expected record accesses with . The horizontal axis is
the value for , the vertical axis is the expected number of accesses to the hash
table. Solid lines show the cost for “random” probing (a theoretical lower bound
on the cost), while dashed lines show the cost for linear probing (a relatively poor
collision resolution strategy). The two leftmost lines show the cost for insertion
(equivalently, unsuccessful search); the two rightmost lines show the cost for dele-
tion (equivalently, successful search).
accessed, the chance of the next record access coming to the same disk block is
no better than random chance in a well-designed hash system. This is because a
good hashing implementation breaks up relationships between search keys. Instead
of improving performance by taking advantage of locality of reference, hashing
trades increased hash table space for an improved chance that the record will be
in its home position. Thus, the more space available for the hash table, the more
efficient hashing should be.
Depending on the pattern of record accesses, it might be possible to reduce the
expected cost of access even in the face of collisions. Recall the 80/20 rule: 80%
of the accesses will come to 20% of the data. In other words, some records are
accessed more frequently. If two records hash to the same home position, which
would be better placed in the home position, and which in a slot further down the
probe sequence? The answer is that the record with higher frequency of access
should be placed in the home position, because this will reduce the total number of
record accesses. Ideally, records along a probe sequence will be ordered by their
frequency of access.
One approach to approximating this goal is to modify the order of records along
the probe sequence whenever a record is accessed. If a search is made to a record
Search WWH ::




Custom Search