Searching - Data Structures and Algorithm Analysis

Java Reference

In-Depth Information

9.4.4

Analysis of Closed Hashing

How efficient is hashing? We can measure hashing performance in terms of the

number of record accesses required when performing an operation. The primary

operations of concern are insertion, deletion, and search. It is useful to distinguish

between successful and unsuccessful searches. Before a record can be deleted, it

must be found. Thus, the number of accesses required to delete a record is equiv-

alent to the number required to successfully search for it. To insert a record, an

empty slot along the record's probe sequence must be found. This is equivalent to

an unsuccessful search for the record (recall that a successful search for the record

during insertion should generate an error because two records with the same key

are not allowed to be stored in the table).

When the hash table is empty, the first record inserted will always find its home

position free. Thus, it will require only one record access to find a free slot. If all

records are stored in their home positions, then successful searches will also require

only one record access. As the table begins to fill up, the probability that a record

can be inserted into its home position decreases. If a record hashes to an occupied

slot, then the collision resolution policy must locate another slot in which to store

it. Finding records not stored in their home position also requires additional record

accesses as the record is searched for along its probe sequence. As the table fills

up, more and more records are likely to be located ever further from their home

positions.

From this discussion, we see that the expected cost of hashing is a function of

how full the table is. Define the load factor for the table as = N=M, where N

is the number of records currently in the table.

An estimate of the expected cost for an insertion (or an unsuccessful search)

can be derived analytically as a function of in the case where we assume that

the probe sequence follows a random permutation of the slots in the hash table.

Assuming that every slot in the table has equal probability of being the home slot

for the next record, the probability of finding the home position occupied is . The

probability of finding both the home position occupied and the next slot on the

probe sequence occupied is N(N1)

M(M1) . The probability of i collisions is

N(N 1)(N i + 1)

M(M 1)(M i + 1) :

If N and M are large, then this is approximately (N=M) i . The expected number

of probes is one plus the sum over i 1 of the probability of i collisions, which is

approximately

X

1

(N=M) i = 1=(1 ):

1 +

i=1

Data Structures and Algorithm Analysis

Search WWH ::

Custom Search

Home