Hash tables - Data Structures and Problem Solving Using Java

Java Reference

In-Depth Information

basic ideas

20.1

The hash table supports the retrieval or deletion of any named item. We want

to be able to support the basic operations in constant time, as for the stack and

queue. Because the accesses are much less restricted, this support seems like

an impossible goal. That is, surely when the size of the set increases, searches

in the set should take longer. However, that is not necessarily the case.

Suppose that all the items we are dealing with are small nonnegative inte-

gers, ranging from 0 to 65,535. We can use a simple array to implement each

operation as follows. First, we initialize an array a that is indexed from 0 to

65,535 with all 0s. To perform insert(i) , we execute a[i]++ . Note that a[i]

represents the number of times that i has been inserted. To perform find(i) ,

we verify that a[i] is not 0. To perform remove(i) , we make sure that a[i] is

positive and then execute a[i]-- . The time for each operation is clearly con-

stant; even the overhead of the array initialization is a constant amount of

work (65,536 assignments).

There are two problems with this solution. First, suppose that we have 32-

bit integers instead of 16-bit integers. Then the array a must hold 4 billion

items, which is impractical. Second, if the items are not integers but instead

are strings (or something even more generic), they cannot be used to index an

array.

The second problem is not really a problem at all. Just as a number 1234

is a collection of digits 1 , 2 , 3 , and 4 , the string "junk" is a collection of char-

acters 'j' , 'u' , 'n' , and 'k' . Note that the number 1234 is just

. Recall from Section 12.1 that an ASCII

character can typically be represented in 7 bits as a number between 0 and

127. Because a character is basically a small integer, we can interpret a

string as an integer. One possible representation is 'j' ⋅ 128 3 + 'u' ⋅ 128 2 +

'n'

The hash table is

used to implement

a set in constant

time per operation.

1 10 3

⋅

2 10 2

⋅

3 10 1

⋅

4 10 0

⋅

⋅ 128 1 + 'k' ⋅ 128 0 . This approach allows the simple array implementa-

tion discussed previously.

The problem with this strategy is that the integer representation described

generates huge integers: The representation for "junk" yields 224,229,227,

and longer strings generate much larger representations. This result brings us

back to the first problem: How do we avoid using an absurdly large array?

We do so by using a function that maps large numbers (or strings inter-

preted as numbers) into smaller, more manageable numbers. A function that

maps an item into a small index is known as a hash function. If x is an arbi-

trary (nonnegative) integer, then x%tableSize generates a number between 0

and tableSize-1 suitable for indexing into an array of size tableSize . If s is a

string, we can convert s to a large integer x by using the method suggested

previously and then apply the mod operator ( % ) to get a suitable index. Thus, if

A hash function

converts the item

into an integer suit-

able to index an

array where the

item is stored. If the

hash function were

one to one, we

could access the

item by its array

index.

Data Structures and Problem Solving Using Java

Search WWH ::

Custom Search

Home