Java Reference
In-Depth Information
basic ideas
20.1
The hash table supports the retrieval or deletion of any named item. We want
to be able to support the basic operations in constant time, as for the stack and
queue. Because the accesses are much less restricted, this support seems like
an impossible goal. That is, surely when the size of the set increases, searches
in the set should take longer. However, that is not necessarily the case.
Suppose that all the items we are dealing with are small nonnegative inte-
gers, ranging from 0 to 65,535. We can use a simple array to implement each
operation as follows. First, we initialize an array a that is indexed from 0 to
65,535 with all 0s. To perform insert(i) , we execute a[i]++ . Note that a[i]
represents the number of times that i has been inserted. To perform find(i) ,
we verify that a[i] is not 0. To perform remove(i) , we make sure that a[i] is
positive and then execute a[i]-- . The time for each operation is clearly con-
stant; even the overhead of the array initialization is a constant amount of
work (65,536 assignments).
There are two problems with this solution. First, suppose that we have 32-
bit integers instead of 16-bit integers. Then the array a must hold 4 billion
items, which is impractical. Second, if the items are not integers but instead
are strings (or something even more generic), they cannot be used to index an
array.
The second problem is not really a problem at all. Just as a number 1234
is a collection of digits 1 , 2 , 3 , and 4 , the string "junk" is a collection of char-
acters 'j' , 'u' , 'n' , and 'k' . Note that the number 1234 is just
. Recall from Section 12.1 that an ASCII
character can typically be represented in 7 bits as a number between 0 and
127. Because a character is basically a small integer, we can interpret a
string as an integer. One possible representation is 'j' 128 3 + 'u' 128 2 +
'n'
The hash table is
used to implement
a set in constant
time per operation.
1 10 3
+
2 10 2
+
3 10 1
+
4 10 0
128 1 + 'k' 128 0 . This approach allows the simple array implementa-
tion discussed previously.
The problem with this strategy is that the integer representation described
generates huge integers: The representation for "junk" yields 224,229,227,
and longer strings generate much larger representations. This result brings us
back to the first problem: How do we avoid using an absurdly large array?
We do so by using a function that maps large numbers (or strings inter-
preted as numbers) into smaller, more manageable numbers. A function that
maps an item into a small index is known as a hash function. If x is an arbi-
trary (nonnegative) integer, then x%tableSize generates a number between 0
and tableSize-1 suitable for indexing into an array of size tableSize . If s is a
string, we can convert s to a large integer x by using the method suggested
previously and then apply the mod operator ( % ) to get a suitable index. Thus, if
A hash function
converts the item
into an integer suit-
able to index an
array where the
item is stored. If the
hash function were
one to one, we
could access the
item by its array
index.
 
 
Search WWH ::




Custom Search