Database Reference
In-Depth Information
ents of x where v is +1 and then subtracting the other components of x - those where v is
−1.
If we pick a collection of random vectors, say v 1 , v 2 , . . . , v n , then we can apply them
to an arbitrary vector x by computing v 1 .x, v 2 .x, . . . , v n .x and then replacing any positive
value by +1 and any negative value by −1. The result is called the sketch of x . You can
handle 0's arbitrarily, e.g., by chosing a result +1 or −1 at random. Since there is only a tiny
probability of a zero dot product, the choice has essentially no effect.
EXAMPLE 3.21 Suppose our space consists of 4-dimensional vectors, and we pick three ran-
dom vectors: v 1 = [+1 , −1 , +1 , +1], v 2 = [−1 , +1 , −1 , +1], and v 3 = [+1 , +1 , −1 , −1]. For the
vector x = [3 , 4 , 5 , 6], the sketch is [+1 , +1 , −1]. That is, v 1 .x = 3−4+5+6 = 10. Since the
result is positive, the first component of the sketch is +1. Similarly, v 2 .x = 3 and v 3 .x = −4,
so the second component of the sketch is +1 and the third component is −1.
Consider the vector y = [4 , 3 , 2 , 1]. We can similarly compute its sketch to be [+1 , −1 ,
+1]. Since the sketches for x and y agree in 1/3 of the positions, we estimate that the angle
between them is 120 degrees. That is, a randomly chosen hyperplane is twice as likely to
look like the dashed line in Fig. 3.12 than like the dotted line.
The above conclusion turns out to be quite wrong. We can calculate the cosine of the
angle between x and y to be x.y , which is
6 × 1 + 5 × 2 + 4 × 3 + 3 × 4 = 40
divided by the magnitudes of the two vectors. These magnitudes are
and Thus, the cosine of the angle between x and y is 0.7875, and this angle is
about 38 degrees. However, if you look at all 16 different vectors v of length 4 that have +1
and −1 as components, you find that there are only four of these whose dot products with x
and y have a different sign, namely v 2 , v 3 , and their complements [+1 , −1 , +1 , −1] and [−1 ,
−1 , +1 , +1]. Thus, had we picked all sixteen of these vectors to form a sketch, the estimate
of the angle would have been 180/4 = 45 degrees.
3.7.4
LSH Families for Euclidean Distance
Now, let us turn to the Euclidean distance ( Section 3.5.2 ), and see if we can develop
a locality-sensitive family of hash functions for this distance. We shall start with a
2-dimensional Euclidean space. Each hash function f in our family F will be associated
with a randomly chosen line in this space. Pick a constant a and divide the line into seg-
Search WWH ::




Custom Search