Java Reference
In-Depth Information
9.4.4
Analysis of Closed Hashing
How efficient is hashing? We can measure hashing performance in terms of the
number of record accesses required when performing an operation. The primary
operations of concern are insertion, deletion, and search. It is useful to distinguish
between successful and unsuccessful searches. Before a record can be deleted, it
must be found. Thus, the number of accesses required to delete a record is equiv-
alent to the number required to successfully search for it. To insert a record, an
empty slot along the record's probe sequence must be found. This is equivalent to
an unsuccessful search for the record (recall that a successful search for the record
during insertion should generate an error because two records with the same key
are not allowed to be stored in the table).
When the hash table is empty, the first record inserted will always find its home
position free. Thus, it will require only one record access to find a free slot. If all
records are stored in their home positions, then successful searches will also require
only one record access. As the table begins to fill up, the probability that a record
can be inserted into its home position decreases. If a record hashes to an occupied
slot, then the collision resolution policy must locate another slot in which to store
it. Finding records not stored in their home position also requires additional record
accesses as the record is searched for along its probe sequence. As the table fills
up, more and more records are likely to be located ever further from their home
positions.
From this discussion, we see that the expected cost of hashing is a function of
how full the table is. Define the load factor for the table as = N=M, where N
is the number of records currently in the table.
An estimate of the expected cost for an insertion (or an unsuccessful search)
can be derived analytically as a function of in the case where we assume that
the probe sequence follows a random permutation of the slots in the hash table.
Assuming that every slot in the table has equal probability of being the home slot
for the next record, the probability of finding the home position occupied is . The
probability of finding both the home position occupied and the next slot on the
probe sequence occupied is N(N1)
M(M1) . The probability of i collisions is
N(N 1)(N i + 1)
M(M 1)(M i + 1) :
If N and M are large, then this is approximately (N=M) i . The expected number
of probes is one plus the sum over i 1 of the probability of i collisions, which is
approximately
X
1
(N=M) i = 1=(1 ):
1 +
i=1
 
Search WWH ::




Custom Search