Databases Reference
In-Depth Information
x
θ
y
Figure 3.12: Two vectors make an angle θ
Suppose we pick a hyperplane through the origin. This hyperplane intersects
the plane of x and y in a line. Figure 3.12 suggests two possible hyperplanes,
one whose intersection is the dashed line and the other's intersection is the
dotted line. To pick a random hyperplane, we actually pick the normal vector
to the hyperplane, say v. The hyperplane is then the set of points whose dot
product with v is 0.
First, consider a vector v that is normal to the hyperplane whose projection
is represented by the dashed line in Fig. 3.12; that is, x and y are on different
sides of the hyperplane. Then the dot products v.x and v.y will have different
signs. If we assume, for instance, that v is a vector whose projection onto the
plane of x and y is above the dashed line in Fig. 3.12, then v.x is positive,
while v.y is negative. The normal vector v instead might extend in the opposite
direction, below the dashed line. In that case v.x is negative and v.y is positive,
but the signs are still different.
On the other hand, the randomly chosen vector v could be normal to a
hyperplane like the dotted line in Fig. 3.12. In that case, both v.x and v.y
have the same sign. If the projection of v extends to the right, then both dot
products are positive, while if v extends to the left, then both are negative.
What is the probability that the randomly chosen vector is normal to a
hyperplane that looks like the dashed line rather than the dotted line? All
angles for the line that is the intersection of the random hyperplane and the
plane of x and y are equally likely. Thus, the hyperplane will look like the
dashed line with probability θ/180 and will look like the dotted line otherwise.
Thus, each hash function f in our locality-sensitive family F is built from
a randomly chosen vector v f . Given two vectors x and y, say f (x) = f (y) if
and only if the dot products v f .x and v f .y have the same sign. Then F is a
locality-sensitive family for the cosine distance. The parameters are essentially
Search WWH ::




Custom Search