Hashing as a Dictionary Implementation - Data Structures and Abstractions with Java

Java Reference

In-Depth Information

T he previous chapter described hashing as a technique for implementing a dictionary when

searching is the primary task. We now study hashing's performance and examine the details of its

implementation in Java.

The Efficiency of Hashing

22.1

As you saw in the previous chapter, implementations of the ADT dictionary depend on whether the

dictionary requires distinct search keys. In this section, we consider only dictionaries with distinct

search keys. Recall that the add method for such a dictionary must ensure that duplicate search keys

do not occur.

Each of the dictionary operations getValue , remove , and add searches the hash table for a

given search key. The success or failure of a search for a given key directly affects the success or

failure of the retrieval and removal operations. The successful addition of a new entry occurs after a

search for a given key fails. An unsuccessful addition replaces the value of an existing entry instead

of adding a new entry. This operation occurs after a successful search for a given key. Thus, we

have the following observations about the time efficiency of these operations:

● A successful retrieval or removal has the same efficiency as a successful search

● An unsuccessful retrieval or removal has the same efficiency as an unsuccessful search

● A successful addition has the same efficiency as an unsuccessful search

● An unsuccessful addition has the same efficiency as a successful search

So it is sufficient to analyze the time efficiency of searching the hash table for a given search key.

VideoNote

Hashing efficiency

Note: The successful retrieval of an entry searches the same chain or probe sequence that

was searched when the entry was first added to the hash table. Thus, the cost of a successful

retrieval of an entry is the same as the cost of inserting that entry.

The Load Factor

22.2

We began our discussion of hashing in the previous chapter with a perfect hash function that caused

no collisions. If you can find a perfect hash function for your particular set of search keys, using it

to implement the ADT dictionary will provide operations that are each O(1). Such an implementa-

tion is ideal. The good news is that finding a perfect hash function is quite feasible in certain situa-

tions. Unfortunately, using a perfect hash function is not always possible or practical. In those

situations, collisions are likely to occur.

Resolving a collision takes time and thus causes the dictionary operations to be slower than an

O(1) operation. As a hash table fills, collisions occur more often, decreasing performance even fur-

ther. Since collision resolution takes considerably more time than evaluating the hash function, it is

the prime contributor to the cost of hashing.

To help us express this cost, we define a measure of how full a hash table is. This measure—the

load factor

λ

—is the ratio of the size of the dictionary to the size of the hash table. That is,

Number of entries in the dictionary

Number of locations in the hash table

λ

=

----------------------------------------------------------------------------------------------------

Data Structures and Abstractions with Java

Search WWH ::

Custom Search

Home