Hashing - Advanced Topics in Java: Core Concepts in Data Structures

Java Reference

In-Depth Information

10.2.1 The Hash Function

In the previous section, we saw how an integer key can be “hashed” to a table location. It turns out that the

“remainder” operation (%) often gives good results for such keys. But what if the keys were non-numeric, for example,

words or names?

The first task is to convert a non-numeric key to a number and then apply “remainder.” Suppose the key is a word.

Perhaps the simplest thing to do is add up the numeric value of each letter in the word. If the word is stored in a string

variable, word , we can do this as follows:

int wordNum = 0;

for (int h = 0; h < word.length(); h++) wordNum += word.charAt(h);

loc = wordNum % n + 1; //loc is assigned a value from 1 to n

This method will work, but one objection is that words that contain the same letters would hash to the same

location. For example, mate , meat , and team will all hash to the same location. In hashing, we must try to avoid

deliberately hashing keys to the same location. One way around this is to assign a weight to each letter depending on

its position in the word.

We can assign weights arbitrarily—the main goal is to avoid hashing keys with the same letters to the same

location. For instance, we can assign 3 to the first position, 5 to the second position, 7 to the third position, and so on.

The following shows how:

int wordNum = 0;

int w = 3;

for (int h = 0; h < word.length(); h++) {

wordNum += word.charAt(h) * w;

w = w + 2;

}

loc = wordNum % n + 1; //loc is assigned a value from 1 to n

The same technique will work if a key contains arbitrary characters.

In hashing, we want the keys to be scattered all over the table. If, for instance, keys are hashed to one area of the

table, we can end up with an unnecessarily high number of collisions. To this end, we should try to use all of the key.

For example, if the keys are alphabetic, it would be unwise to map all keys beginning with the same letter to the same

location. Put another way, we should avoid systematically hitting the same location.

And since hashing is meant to be fast, the hash function should be relatively easy to calculate. The speed

advantage will be diminished if we spend too much time computing the hash location.

10.2.2 Deleting an Item from a Hash Table

Consider, again, the array after all the sample numbers have been inserted:

num

84

23

61

52

16

43

31

33

59

1

2

3

4

5

6

7

8

9

10

11

12

Recall that 43 and 31 both hashed initially to location 8 . Suppose we want to delete 43 . The first thought might be

to set its location to empty. Assume we did this (set num[8] to empty) and were now looking for 31 . This will hash to 8 ;

but since num[8] is empty, we will conclude, wrongly, that 31 is not in the table. So, we cannot delete an item simply

by setting its location to empty since other items may become unreachable.

The easiest solution is to set its location to a deleted value—some value that cannot be confused with empty or a

key. In this example, if the keys are positive integers, we can use 0 for Empty and -1 for Deleted .

Advanced Topics in Java: Core Concepts in Data Structures

Search WWH ::

Custom Search

Home