Java Reference
In-Depth Information
10.2.1 The Hash Function
In the previous section, we saw how an integer key can be “hashed” to a table location. It turns out that the
“remainder” operation (%) often gives good results for such keys. But what if the keys were non-numeric, for example,
words or names?
The first task is to convert a non-numeric key to a number and then apply “remainder.” Suppose the key is a word.
Perhaps the simplest thing to do is add up the numeric value of each letter in the word. If the word is stored in a string
variable, word , we can do this as follows:
int wordNum = 0;
for (int h = 0; h < word.length(); h++) wordNum += word.charAt(h);
loc = wordNum % n + 1; //loc is assigned a value from 1 to n
This method will work, but one objection is that words that contain the same letters would hash to the same
location. For example, mate , meat , and team will all hash to the same location. In hashing, we must try to avoid
deliberately hashing keys to the same location. One way around this is to assign a weight to each letter depending on
its position in the word.
We can assign weights arbitrarily—the main goal is to avoid hashing keys with the same letters to the same
location. For instance, we can assign 3 to the first position, 5 to the second position, 7 to the third position, and so on.
The following shows how:
int wordNum = 0;
int w = 3;
for (int h = 0; h < word.length(); h++) {
wordNum += word.charAt(h) * w;
w = w + 2;
}
loc = wordNum % n + 1; //loc is assigned a value from 1 to n
The same technique will work if a key contains arbitrary characters.
In hashing, we want the keys to be scattered all over the table. If, for instance, keys are hashed to one area of the
table, we can end up with an unnecessarily high number of collisions. To this end, we should try to use all of the key.
For example, if the keys are alphabetic, it would be unwise to map all keys beginning with the same letter to the same
location. Put another way, we should avoid systematically hitting the same location.
And since hashing is meant to be fast, the hash function should be relatively easy to calculate. The speed
advantage will be diminished if we spend too much time computing the hash location.
10.2.2 Deleting an Item from a Hash Table
Consider, again, the array after all the sample numbers have been inserted:
num
84
23
61
52
16
43
31
33
59
1
2
3
4
5
6
7
8
9
10
11
12
Recall that 43 and 31 both hashed initially to location 8 . Suppose we want to delete 43 . The first thought might be
to set its location to empty. Assume we did this (set num[8] to empty) and were now looking for 31 . This will hash to 8 ;
but since num[8] is empty, we will conclude, wrongly, that 31 is not in the table. So, we cannot delete an item simply
by setting its location to empty since other items may become unreachable.
The easiest solution is to set its location to a deleted value—some value that cannot be confused with empty or a
key. In this example, if the keys are positive integers, we can use 0 for Empty and -1 for Deleted .
 
Search WWH ::




Custom Search