Databases Reference
In-Depth Information
the same as for the Jaccard-distance family described in Section 3.6.2, except
the scale of distances is 0-180 rather than 0-1. That is, F is a
(d 1 , d 2 , (180−d 1 )/180, d 2 /180)-sensitive
family of hash functions. From this basis, we can amplify the family as we wish,
just as for the minhash-based family.
3.7.3 Sketches
Instead of chosing a random vector from all possible vectors, it turns out to be
su ciently random if we restrict our choice to vectors whose components are
+1 and−1. The dot product of any vector x with a vector v of +1's and−1's
is formed by adding the components of x where v is +1 and then subtracting
the other components of x - those where v is−1.
If we pick a collection of random vectors, say v 1 , v 2 , . . . , v n , then we can
apply them to an arbitrary vector x by computing v 1 .x, v 2 .x, . . . , v n .x and then
replacing any positive value by +1 and any negative value by−1. The result is
called the sketch of x. You can handle 0's arbitrarily, e.g., by chosing a result +1
or−1 at random. Since there is only a tiny probability of a zero dot product,
the choice has essentially no effect.
Example 3.21 : Suppose our space consists of 4-dimensional vectors, and we
pick three random vectors: v 1 = [+1,−1, +1, +1], v 2 = [−1, +1,−1, +1], and
v 3 = [+1, +1,−1,−1]. For the vector x = [3, 4, 5, 6], the sketch is [+1, +1,−1].
That is, v 1 .x = 3−4+5+6 = 10. Since the result is positive, the first component
of the sketch is +1. Similarly, v 2 .x = 3 and v 3 .x =−4, so the second component
of the sketch is +1 and the third component is−1.
Consider the vector y = [4, 3, 2, 1]. We can similarly compute its sketch to
be [+1,−1, +1]. Since the sketches for x and y agree in 1/3 of the positions,
we estimate that the angle between them is 120 degrees. That is, a randomly
chosen hyperplane is twice as likely to look like the dashed line in Fig. 3.12 than
like the dotted line.
The above conclusion turns out to be quite wrong.
We can calculate the
cosine of the angle between x and y to be x.y, which is
6×1 + 5×2 + 4×3 + 3×4 = 40
divided by the magnitudes of the two vectors. These magnitudes are
6 2 + 5 2 + 4 2 + 3 2 = 9.274
1 2 + 2 2 + 3 2 + 4 2 = 5.477. Thus, the cosine of the angle between x and
y is 0.7875, and this angle is about 38 degrees. However, if you look at all
16 different vectors v of length 4 that have +1 and−1 as components, you
find that there are only four of these whose dot products with x and y have
a different sign, namely v 2 , v 3 , and their complements [+1,−1, +1,−1] and
[−1,−1, +1, +1]. Thus, had we picked all sixteen of these vectors to form a
sketch, the estimate of the angle would have been 180/4 = 45 degrees.
and
2
Search WWH ::




Custom Search