Database Reference
In-Depth Information
Suppose we pick a hyperplane through the origin. This hyperplane intersects the plane of
x and y in a line. Figure 3.12 suggests two possible hyperplanes, one whose intersection is
the dashed line and the other's intersection is the dotted line. To pick a random hyperplane,
we actually pick the normal vector to the hyperplane, say v . The hyperplane is then the set
of points whose dot product with v is 0.
First, consider a vector v that is normal to the hyperplane whose projection is represented
by the dashed line in Fig. 3.12 ; that is, x and y are on different sides of the hyperplane. Then
the dot products v.x and v.y will have different signs. If we assume, for instance, that v is
a vector whose projection onto the plane of x and y is above the dashed line in Fig. 3.12 ,
then v.x is positive, while v.y is negative. The normal vector v instead might extend in the
opposite direction, below the dashed line. In that case v.x is negative and v.y is positive, but
the signs are still different.
On the other hand, the randomly chosen vector v could be normal to a hyperplane like
the dotted line in Fig. 3.12 . In that case, both v.x and v.y have the same sign. If the projec-
tion of v extends to the right, then both dot products are positive, while if v extends to the
left, then both are negative.
What is the probability that the randomly chosen vector is normal to a hyperplane that
looks like the dashed line rather than the dotted line? All angles for the line that is the in-
tersection of the random hyperplane and the plane of x and y are equally likely. Thus, the
hyperplane will look like the dashed line with probability θ /180 and will look like the dot-
ted line otherwise.
Thus, each hash function f in our locality-sensitive family F is built from a randomly
chosen vector v f . Given two vectors x and y , say f ( x ) = f ( y ) if and only if the dot products
v f .x and v f .y have the same sign. Then F is a locality-sensitive family for the cosine dis-
tance. The parameters are essentially the same as for the Jaccard-distance family described
in Section 3.6.2 , except the scale of distances is 0-180 rather than 0-1. That is, F is a
( d 1 , d 2 , (180 − d 1 )/180 , (180 − d 2 )/180)-sensitive
family of hash functions. From this basis, we can amplify the family as we wish, just as for
the minhash-based family.
3.7.3
Sketches
Instead of chosing a random vector from all possible vectors, it turns out to be sufficiently
random if we restrict our choice to vectors whose components are +1 and −1. The dot
product of any vector x with a vector v of +1's and −1's is formed by adding the compon-
Search WWH ::




Custom Search