Introducing Hashing - Data Structures and Abstractions with Java

Java Reference

In-Depth Information

Note: A perfect hash function maps each search key into a different integer that is suitable

as an index to the hash table.

21.4

Typical hashing. Because we need a database of all street addresses in the previous example, we

must have one entry in the hash table for each telephone number. Our perfect hash function needs

a hash table this large because it produces 10,000 different indices between 0 and 9999 from the

10,000 possible search keys. This hash table is always full if every telephone number in the 555

exchange is assigned. Although a full hash table is quite reasonable for this application, most

hash tables are not full and can even be sparse —that is, have only a few of their elements actually

in use.

For example, if our small town required only 700 telephone numbers, most of the 10,000-

location hash table would be unused. We would waste most of the space allocated to the hash

table. If the 700 numbers were not sequential, we would need a different hash function if we

wanted to use a smaller hash table.

We might develop this hash function as follows. Given a nonnegative integer i and a hash

table with n locations, the value of i modulo n ranges from 0 to n - 1. Since i is nonnegative, i

modulo n is the integer remainder after dividing i by n . This value is a valid index for the hash

table. So, a hash function h for a telephone number could have the following algorithm:

Algorithm getHashIndex(phoneNumber)

// Returns an index to an array of tableSize locations.

i = last four digits of phoneNumber

return i % tableSize

This hash function—like typical hash functions—performs two steps:

1.

Convert the search key to an integer called the hash code .

2.

Compress the hash code into the range of indices for the hash table.

Often the search key is not an integer, and frequently it is a string. So, a hash function first converts

the key to an integer hash code. Next, it transforms that integer into one that is suitable as an index

to the particular hash table.

The hash function that the algorithm getHashIndex describes is not a perfect hash function when

tableSize is less than 10,000. Since 10,000 telephone numbers map into tableSize indices, some

telephone numbers will map into the same index. We call such an occurrence a collision . For exam-

ple, if tableSize is 101, getHashIndex("555-1214") and getHashIndex("555-8132") each map

into 52. If we have already stored the street address for 555-1214 in hashTable[52] , as Figure 21-2

shows, what will we do with the address for 555-8132? Handling such collisions is called collision

resolution . Before we look at collision resolution, we explore hash functions a bit further.

Note: Typical hash functions are not perfect, because they can allow more than one search

key to map into a single index, causing a collision in the hash table.

Search WWH ::

Custom Search

Home