Java Reference
In-Depth Information
from the original table to the location at index 0 in the new table. And you cannot copy it to the
location at index 83 in the new table, because you also need to consider collisions.
After creating a new, larger hash table of an appropriate size, you use the dictionary method
add to add each item in the original hash table to the new table. The method computes the hash
index using the size of the new table and handles any collisions. This process of enlarging a hash
table and computing new hash indices for its contents is called rehashing . You can see that increas-
ing the size of a hash table requires considerably more work than increasing the size of an ordinary
array. Rehashing is a task that you should not do often.
Note: Rehashing
When the load factor
becomes too large, resize the hash table. To compute the table's new
size, first double its present size and then increase the result to the next prime number. Use
the method add to add the current entries in the dictionary to the new hash table.
λ
Question 3 Consider a hash table of size 5. The function c % 5 places the hash codes 20, 6,
18, and 14 into locations at indices 0, 1, 3, and 4, respectively. Show the effects of rehash-
ing on this hash table when linear probing resolves collisions.
Note: Dynamic hashing allows a hash table to grow or shrink in size without the expense
of rehashing. This technique, which we will not cover, is particularly useful in database man-
agement environments when the database is stored in external files.
Comparing Schemes for Collision Resolution
22.8
In previous segments, you saw how the load factor
affects the average number of comparisons
required by a search of a hash table for various ways to resolve collisions. The graphs in Figure 22-4
illustrate this effect for various collision resolution schemes. When
λ
is less than 0.5, the average
number of comparisons for a successful search is about the same regardless of the process used to
resolve collisions. For unsuccessful searches, the three open addressing schemes have about the
same efficiency when
λ
λ
is less than 0.5. However, separate chaining is somewhat more efficient in
this case.
As
exceeds 0.5, the efficiency of open addressing degrades rapidly, with linear probing the
least efficient. Separate chaining, on the other hand, remains efficient for values of
λ
up to 1. In
fact, the tabulated data in Figure 22-3 shows that its efficiency degrades only slightly for
λ
λ
between
1 and 2.
Separate chaining certainly appears to be the fastest approach. But separate chaining can
require more memory than open addressing, since each location in the hash table can reference a
chain of linked nodes. On the other hand, the hash table itself can be smaller than when you use an
open addressing scheme, since
λ
can be larger. Thus, space need not be a deciding factor in how
you resolve collisions.
If all of these collision resolution schemes used hash tables of equal size, open addressing
would be more likely to lead to rehashing than separate chaining would. To reduce the likelihood of
rehashing, an open addressing strategy could use a large hash table. Remember that in Java, the
table contains references that do not require much memory. So even though at least half of the table
must remain unused, the actual space allocation would not be excessive.
 
 
Search WWH ::




Custom Search