Database Reference
In-Depth Information
ments of length a , as suggested by Fig. 3.13 , where the “random” line has been oriented to
be horizontal.
Figure 3.13 Two points at distance d a have a small chance of being hashed to the same bucket
The segments of the line are the buckets into which function f hashes points. A point is
hashed to the bucket in which its projection onto the line lies. If the distance d between
two points is small compared with a , then there is a good chance the two points hash to the
same bucket, and thus the hash function f will declare the two points equal. For example, if
d = a /2, then there is at least a 50% chance the two points will fall in the same bucket. In
fact, if the angle θ between the randomly chosen line and the line connecting the points is
large, then there is an even greater chance that the two points will fall in the same bucket.
For instance, if θ is 90 degrees, then the two points are certain to fall in the same bucket.
However, suppose d is larger than a . In order for there to be any chance of the two points
falling in the same bucket, we need d cos θ a . The diagram of Fig. 3.13 suggests why this
requirement holds. Note that even if d cos θ a it is still not certain that the two points
will fall in the same bucket. However, we can guarantee the following. If d ≥ 2 a , then there
is no more than a 1/3 chance the two points fall in the same bucket. The reason is that for
cos θ to be less than 1/2, we need to have θ in the range 60 to 90 degrees. If θ is in the range
0 to 60 degrees, then cos θ is more than 1/2. But since θ is the smaller angle between two
randomly chosen lines in the plane, θ is twice as likely to be between 0 and 60 as it is to be
between 60 and 90.
We conclude that the family F just described forms a ( a /2 , 2 a, 1/2 , 1/3)-sensitive family
of hash functions. That is, for distances up to a /2 the probability is at least 1/2 that two
points at that distance will fall in the same bucket, while for distances at least 2 a the prob-
ability points at that distance will fall in the same bucket is at most 1/3. We can amplify
this family as we like, just as for the other examples of locality-sensitive hash functions we
have discussed.
Search WWH ::




Custom Search