Java Reference
In-Depth Information
objects will have different hash codes. To be useful as a dictionary implementation, hashing must map
equal objects into the same location in a hash table. Thus, a class should define its own version of
hashCode that adheres to the following guidelines.
Note: Guidelines for the method hashCode
If a class overrides the method equals , it should override hashCode .
If the method equals considers two objects equal, hashCode must return the same
value for both objects.
If you call an object's hashCode more than once during the execution of a program,
and if the object's data remains the same during this time, hashCode must return the
same hash code.
An object's hash code during one execution of a program can differ from its hash
code during another execution of the same program.
A perfect hash function would require that unequal objects have distinct hash codes. In gen-
eral, however, unequal objects might have the same hash codes. Since duplicate hash codes lead to
collisions, you want to avoid this situation when possible.
21.7
A hash code for a string. Search keys are often strings, so generating a good hash code from a
string is important. Typically, you begin by assigning an integer to each character in the string. For
example, you could assign the integers 1 through 26 to the letters “A” through “Z” and the integers
27 through 52 to the letters “a” through “z.” However, using a character's Unicode integer is more
common and actually easier to do.
Suppose that the search keys for a telephone directory are names such as Brett , Carol , Gail ,
and Josh . You can compute hash codes for these names in several ways. For example, you could
take the Unicode value of the first letter in each name and get distinct hash codes. But if several
names begin with the same letter, their hash codes will be the same if you use this scheme. Since
the letters that occur in any one position of a name do not occur with equal probability, a hash
function that uses any particular letter will not distribute the names uniformly throughout the
hash table.
Suppose that you sum the Unicode values for each letter in the search key. In an application
where two different search keys never contain the same letters, this approach can work. But if
your search keys are airport codes, for example, DUB and BUD would have the same hash code.
This approach also can restrict the range of the hash codes, since the Unicode values for letters
lie between 65 and 122. Thus, three-letter words would map into values between 195 and 366
under this plan.
Note: Real-world data is not uniformly distributed.
21.8
A better hash code for a string. A better approach to generating a hash code for a string involves
multiplying the Unicode value of each character by a factor based on the character's position within
the string. The hash code is then the sum of these products. Specifically, if the string s has n charac-
ters, let u i be the Unicode value for the i th character in s ( i is zero for the first character). Then the
hash code can have the form
u 0 g n - 1 + u 1 g n - 2 + + u n - 2 g + u n - 1
 
Search WWH ::




Custom Search