Java Reference
In-Depth Information
T he previous chapter described hashing as a technique for implementing a dictionary when
searching is the primary task. We now study hashing's performance and examine the details of its
implementation in Java.
The Efficiency of Hashing
22.1
As you saw in the previous chapter, implementations of the ADT dictionary depend on whether the
dictionary requires distinct search keys. In this section, we consider only dictionaries with distinct
search keys. Recall that the add method for such a dictionary must ensure that duplicate search keys
do not occur.
Each of the dictionary operations getValue , remove , and add searches the hash table for a
given search key. The success or failure of a search for a given key directly affects the success or
failure of the retrieval and removal operations. The successful addition of a new entry occurs after a
search for a given key fails. An unsuccessful addition replaces the value of an existing entry instead
of adding a new entry. This operation occurs after a successful search for a given key. Thus, we
have the following observations about the time efficiency of these operations:
A successful retrieval or removal has the same efficiency as a successful search
An unsuccessful retrieval or removal has the same efficiency as an unsuccessful search
A successful addition has the same efficiency as an unsuccessful search
An unsuccessful addition has the same efficiency as a successful search
So it is sufficient to analyze the time efficiency of searching the hash table for a given search key.
VideoNote
Hashing efficiency
Note: The successful retrieval of an entry searches the same chain or probe sequence that
was searched when the entry was first added to the hash table. Thus, the cost of a successful
retrieval of an entry is the same as the cost of inserting that entry.
The Load Factor
22.2
We began our discussion of hashing in the previous chapter with a perfect hash function that caused
no collisions. If you can find a perfect hash function for your particular set of search keys, using it
to implement the ADT dictionary will provide operations that are each O(1). Such an implementa-
tion is ideal. The good news is that finding a perfect hash function is quite feasible in certain situa-
tions. Unfortunately, using a perfect hash function is not always possible or practical. In those
situations, collisions are likely to occur.
Resolving a collision takes time and thus causes the dictionary operations to be slower than an
O(1) operation. As a hash table fills, collisions occur more often, decreasing performance even fur-
ther. Since collision resolution takes considerably more time than evaluating the hash function, it is
the prime contributor to the cost of hashing.
To help us express this cost, we define a measure of how full a hash table is. This measure—the
load factor
λ
—is the ratio of the size of the dictionary to the size of the hash table. That is,
Number of entries in the dictionary
Number of locations in the hash table
λ
=
----------------------------------------------------------------------------------------------------
 
 
 
Search WWH ::




Custom Search