Hashing as a Dictionary Implementation - Data Structures and Abstractions with Java

Java Reference

In-Depth Information

from the original table to the location at index 0 in the new table. And you cannot copy it to the

location at index 83 in the new table, because you also need to consider collisions.

After creating a new, larger hash table of an appropriate size, you use the dictionary method

add to add each item in the original hash table to the new table. The method computes the hash

index using the size of the new table and handles any collisions. This process of enlarging a hash

table and computing new hash indices for its contents is called rehashing . You can see that increas-

ing the size of a hash table requires considerably more work than increasing the size of an ordinary

array. Rehashing is a task that you should not do often.

Note: Rehashing

When the load factor

becomes too large, resize the hash table. To compute the table's new

size, first double its present size and then increase the result to the next prime number. Use

the method add to add the current entries in the dictionary to the new hash table.

λ

Question 3 Consider a hash table of size 5. The function c % 5 places the hash codes 20, 6,

18, and 14 into locations at indices 0, 1, 3, and 4, respectively. Show the effects of rehash-

ing on this hash table when linear probing resolves collisions.

Note: Dynamic hashing allows a hash table to grow or shrink in size without the expense

of rehashing. This technique, which we will not cover, is particularly useful in database man-

agement environments when the database is stored in external files.

Comparing Schemes for Collision Resolution

22.8

In previous segments, you saw how the load factor

affects the average number of comparisons

required by a search of a hash table for various ways to resolve collisions. The graphs in Figure 22-4

illustrate this effect for various collision resolution schemes. When

λ

is less than 0.5, the average

number of comparisons for a successful search is about the same regardless of the process used to

resolve collisions. For unsuccessful searches, the three open addressing schemes have about the

same efficiency when

λ

is less than 0.5. However, separate chaining is somewhat more efficient in

this case.

As

exceeds 0.5, the efficiency of open addressing degrades rapidly, with linear probing the

least efficient. Separate chaining, on the other hand, remains efficient for values of

λ

up to 1. In

fact, the tabulated data in Figure 22-3 shows that its efficiency degrades only slightly for

λ

between

1 and 2.

Separate chaining certainly appears to be the fastest approach. But separate chaining can

require more memory than open addressing, since each location in the hash table can reference a

chain of linked nodes. On the other hand, the hash table itself can be smaller than when you use an

open addressing scheme, since

λ

can be larger. Thus, space need not be a deciding factor in how

you resolve collisions.

If all of these collision resolution schemes used hash tables of equal size, open addressing

would be more likely to lead to rehashing than separate chaining would. To reduce the likelihood of

rehashing, an open addressing strategy could use a large hash table. Remember that in Java, the

table contains references that do not require much memory. So even though at least half of the table

must remain unused, the actual space allocation would not be excessive.

Data Structures and Abstractions with Java

Search WWH ::

Custom Search

Home