Java Reference
In-Depth Information
This phenomenon of clustering is one of the main drawbacks of linear probing. Long chains tend to get longer
since the probability of hashing to a long chain is usually greater than that of hashing to a short chain. It is also easy
for two short chains to be joined, creating a longer chain that, in turn, will tend to get longer. For example, any key that
ends up in num[7] will create a long chain from locations 5 to 10 .
We define two types of clustering.
Primary clustering occurs when keys that hash to different locations trace the same sequence
in looking for an empty location. Linear probing exhibits primary clustering since a key that
hashes to 5 , say, will trace 5 , 6 , 7 , 8 , 9 , and so on, and a key that hashes to 6 will trace 6 , 7 , 8 , 9 ,
and so on.
Secondary clustering occurs when keys that hash to the same location trace the same sequence
in looking for an empty location. Linear probing exhibits secondary clustering since keys that
hash to 5 , say, will trace the same sequence 5 , 6 , 7 , 8 , 9 , and so on.
Methods of resolving collisions that hope to improve on linear probing will target the elimination of primary
and/or secondary clustering.
You may wonder if using loc = loc + k where k is a constant greater than 1 (for example, 3 ) will give any better
results than loc = loc + 1 . As it turns out, this will not alter the clustering phenomenon since groups of k -apart keys
will still be formed.
In addition, it can even be worse than when k is 1 since it is possible that not all locations will be generated.
Suppose the table size is 12 , k is 3 , and a key hashes to 5 . The sequence of locations traced will be 5 , 8 , 11 , 2
( 11 + 3 - 12 ), 5 , and the sequence repeats itself. In other words, only relatively few locations will be probed in
looking for an empty location. By comparison, when k is 1 , all locations are generated.
However, this is not really a problem. If the table size is m and k is “relatively prime” to m (their only common
factor is 1), then all locations are generated. Two numbers will be relatively prime if one is a prime and the other is
not a multiple of it, such as 5 and 12. But being prime is not a necessary condition. The numbers 21 and 50 (neither of
which is prime) are relatively prime since they have no common factors other than 1.
If k is 5 and m is 12 , a key hashing to 5 will trace the sequence 5 , 10 , 3 , 8 , 1 , 6 , 11 , 4 , 9 , 2 , 7 , 12 —all locations are
generated. A key hashing to any other location will also generate all locations.
In any case, being able to generate all locations is academic since if we had to trace many locations to find an
empty one, the search would be too slow, and we would probably need to use another method.
Notwithstanding what we've just said, it turns out that loc = loc + k , where k varies with the key, gives us one of
the best ways to implement hashing. We will see how in Section 10.3.4.
So, how fast is the linear method? We are interested in the average search length , that is, the number of locations
that must be examined to find or insert a given key. In the previous example, the search length of 33 is 1 , the search
length of 61 is 2 , and the search length of 23 is 3 .
The search length is a function of the load factor , f , of the table, where
number of entries in table
f= number of table locations =
fraction of table filled
For a successful search, the average number of comparisons is 1 1
, and for an unsuccessful search,
2 1f
+
1
1
1+
the average number of comparisons is
. Note that the search length depends only on the fraction of the
( ) 2
2
1-f
table filled, not on the table size.
Table 10-1 shows how the search length increases as the table fills up.
 
 
Search WWH ::




Custom Search