Databases Reference
In-Depth Information
3.6.3 Amplifying a Locality-Sensitive Family
Suppose we are given a (d 1 , d 2 , p 1 , p 2 )-sensitive family F. We can construct a
new family F by the AND-construction on F, which is defined as follows. Each
member of F consists of r members of F for some fixed r. If f is in F , and f is
constructed from the set{f 1 , f 2 , . . . , f r
}of members of F, we say f (x) = f (y)
if and only if f i (x) = f i (y) for all i = 1, 2, . . . , r. Notice that this construction
mirrors the effect of the r rows in a single band: the band makes x and y a
candidate pair if every one of the r rows in the band say that x and y are equal
(and therefore a candidate pair according to that row).
Since the members of F are independently chosen to make a member of F ,
we can assert that F is a
-sensitive family. That is, for any
p, if p is the probability that a member of F will declare (x, y) to be a candidate
pair, then the probability that a member of F will so declare is p r .
There is another construction, which we call the OR-construction, that turns
a (d 1 , d 2 , p 1 , p 2 )-sensitive family F into a
d 1 , d 2 , (p 1 ) r , (p 2 ) r
-
sensitive family F . Each member f of F is constructed from b members of F,
say f 1 , f 2 , . . . , f b . We define f (x) = f (y) if and only if f i (x) = f i (y) for one or
more values of i. The OR-construction mirrors the effect of combining several
bands: x and y become a candidate pair if any band makes them a candidate
pair.
If p is the probability that a member of F will declare (x, y) to be a candidate
pair, then 1−p is the probability it will not so declare. (1−p) b is the probability
that none of f 1 , f 2 , . . . , f b will declare (x, y) a candidate pair, and 1−(1−p) b
is the probability that at least one f i will declare (x, y) a candidate pair, and
therefore that f will declare (x, y) to be a candidate pair.
Notice that the AND-construction lowers all probabilities, but if we choose F
and r judiciously, we can make the small probability p 2 get very close to 0, while
the higher probability p 1 stays significantly away from 0. Similarly, the OR-
construction makes all probabilities rise, but by choosing F and b judiciously,
we can make the larger probability approach 1 while the smaller probability
remains bounded away from 1. We can cascade AND- and OR-constructions in
any order to make the low probability close to 0 and the high probability close
to 1. Of course the more constructions we use, and the higher the values of r
and b that we pick, the larger the number of functions from the original family
that we are forced to use. Thus, the better the final family of functions is, the
longer it takes to apply the functions from this family.
d 1 , d 2 , 1−(1−p 1 ) b , 1−(1−p 2 ) b
Example 3.18 : Suppose we start with a family F. We use the AND-construc-
tion with r = 4 to produce a family F 1 . We then apply the OR-construction
to F 1 with b = 4 to produce a third family F 2 . Note that the members of F 2
each are built from 16 members of F, and the situation is analogous to starting
with 16 minhash functions and treating them as four bands of four rows each.
The 4-way AND-function converts any probability p into p 4 . When we
follow it by the 4-way OR-construction, that probability is further converted
into 1−(1−p 4 ) 4 . Some values of this transformation are indicated in Fig. 3.10.
Search WWH ::




Custom Search