Introducing Hashing - Data Structures and Abstractions with Java

Java Reference

In-Depth Information

Now let's see how to do this in Java. The expression key >> 32 shifts the 64-bit key to the

right by 32 bits, in effect eliminating its right half. Java's exclusive-or operator is ^ and has the

following effect on 1-bit quantities:

0 ^ 0 is 0

1 ^ 1 is 0

0 ^ 1 is 1

1 ^ 0 is 1

For two multibit quantities, the operator combines pairs of corresponding bits. So

1100 ^ 1010 is 0110

Thus, the expression key ^ (key >> 32) uses an exclusive-or operation to combine the halves of a

64-bit key . Although the result has 64 bits, the rightmost 32 bits contain the combined halves of

key . We discard the leftmost 32 bits by casting the result to an int . Thus, the necessary computa-

tion is

( int )(key ^ (key >> 32))

We can perform a similar computation for a search key of type double . Since key is a real

value, we cannot use it in the previous expression. Instead, we must get key 's bit pattern by calling

Double.doubleToLongBits(key) . Thus, the following statements produce the desired hash code:

long bits = Double.doubleToLongBits(key);

int hashCode = ( int )(bits ^ (bits >> 32));

Why not simply cast the search key from double to int ? Since the search key is a real value,

casting it to an int will simply give us the integral portion of the value. For example, if the key's

value is 32.98, casting it to int results in the integer 32. While we could use 32 as the hash code,

all search keys that have 32 as their integer portion also would have a hash code of 32. Unless

you know that your real values have distinct integral portions, casting them to int values can

cause many collisions.

The hash code of a search key of type float can be simply its 32 bits. You get these by call-

ing Float.floatToIntBits(key) .

These computations of hash codes for the primitives types are actually used by the corre-

sponding wrapper classes in their implementations of the method hashCode .

Compressing a Hash Code into an Index for the Hash Table

21.10

The most common way to scale an integer so that it lies within a given range of values is to use

Java's % operator. For a positive hash code c and a positive integer n , c % n divides c by n and takes

the remainder as the result. This remainder lies between 0 and n - 1. Thus, c % n is ideal for the

index of a hash table that has n locations.

So, n should equal the size of the hash table, but not any n will do. For example, if n is even,

c % n has the same parity as c —that is, if c is even, c % n is even; if c is odd, c % n is odd. If the

hash codes are biased toward either even or odd values (and note that hash codes based on mem-

ory addresses are typically even), the indices to the hash table will have the same bias. Instead of

a uniform distribution of indices, you will leave out the indices of many table locations if n is

even. Thus, n —the size of the hash table—always should be an odd number.

When n is a prime number —one that is divisible only by 1 and itself— c % n provides values

that are distributed throughout the index range 0 through n - 1. Prime numbers—with the excep-

tion of 2—are odd.

Data Structures and Abstractions with Java

Search WWH ::

Custom Search

Home