Java Reference
In-Depth Information
10.2.1 The Hash Function
In the previous section, we saw how an integer key can be “hashed” to a table location. It turns out that the
“remainder” operation (%) often gives good results for such keys. But what if the keys were non-numeric, for example,
words or names?
The first task is to convert a non-numeric key to a number and then apply “remainder.” Suppose the key is a word.
Perhaps the simplest thing to do is add up the numeric value of each letter in the word. If the word is stored in a string
variable,
word
, we can do this as follows:
int wordNum = 0;
for (int h = 0; h < word.length(); h++) wordNum += word.charAt(h);
loc = wordNum % n + 1; //loc is assigned a value from 1 to n
This method will work, but one objection is that words that contain the same letters would hash to the same
location. For example,
mate
,
meat
, and
team
will all hash to the same location. In hashing, we must try to avoid
deliberately hashing keys to the same location. One way around this is to assign a weight to each letter depending on
its position in the word.
We can assign weights arbitrarily—the main goal is to avoid hashing keys with the same letters to the same
location. For instance, we can assign 3 to the first position, 5 to the second position, 7 to the third position, and so on.
The following shows how:
int wordNum = 0;
int w = 3;
for (int h = 0; h < word.length(); h++) {
wordNum += word.charAt(h) * w;
w = w + 2;
}
loc = wordNum % n + 1; //loc is assigned a value from 1 to n
The same technique will work if a key contains arbitrary characters.
In hashing, we want the keys to be scattered all over the table. If, for instance, keys are hashed to one area of the
table, we can end up with an unnecessarily high number of collisions. To this end, we should try to use
all
of the key.
For example, if the keys are alphabetic, it would be unwise to map all keys beginning with the same letter to the same
location. Put another way, we should avoid systematically hitting the same location.
And since hashing is meant to be fast, the hash function should be relatively easy to calculate. The speed
advantage will be diminished if we spend too much time computing the hash location.
10.2.2 Deleting an Item from a Hash Table
Consider, again, the array after all the sample numbers have been inserted:
num
84
23
61
52
16
43
31
33
59
1
2
3
4
5
6
7
8
9
10
11
12
Recall that
43
and
31
both hashed initially to location
8
. Suppose we want to delete
43
. The first thought might be
to set its location to empty. Assume we did this (set
num[8]
to empty) and were now looking for
31
. This will hash to
8
;
but since
num[8]
is empty, we will conclude, wrongly, that
31
is not in the table. So, we cannot delete an item simply
by setting its location to empty since other items may become unreachable.
The easiest solution is to set its location to a
deleted
value—some value that cannot be confused with
empty
or a
key. In this example, if the keys are positive integers, we can use
0
for
Empty
and
-1
for
Deleted
.
Search WWH ::
Custom Search