Database Reference
In-Depth Information
3.6.4
Exercises for Section 3.6
EXERCISE 3.6.1 What is the effect on probability of starting with the family of minhash
functions and applying:
(a) A 2-way AND construction followed by a 3-way OR construction.
(b) A 3-way OR construction followed by a 2-way AND construction.
(c) A 2-way AND construction followed by a 2-way OR construction, followed by a
2-way AND construction.
(d) A 2-way OR construction followed by a 2-way AND construction, followed by a
2-way OR construction followed by a 2-way AND construction.
EXERCISE 3.6.2 Find the fixedpoints for each of the functions constructed in Exercise 3.6.1 .
! EXERCISE 3.6.3 Any function of probability p , such as that of Fig. 3.10 , has a slope given
by the derivative of the function. The maximum slope is where that derivative is a maxim-
um. Find the value of p that gives a maximum slope for the S-curves given by Fig. 3.10 and
Fig. 3.11 . What are the values of these maximum slopes?
!! EXERCISE 3.6.4 Generalize Exercise 3.6.3 to give, as a function of r and b , the point of
maximum slope and the value of that slope, for families of functions defined from the min-
hash functions by:
(a) An r -way AND construction followed by a b -way OR construction.
(b) A b -way OR construction followed by an r -way AND construction.
3.7 LSH Families for Other Distance Measures
There is no guarantee that a distance measure has a locality-sensitive family of hash func-
tions. So far, we have only seen such families for the Jaccard distance. In this section, we
shall show how to construct locality-sensitive families for Hamming distance, the cosine
distance and for the normal Euclidean distance.
3.7.1
LSH Families for Hamming Distance
It is quite simple to build a locality-sensitive family of functions for the Hamming distance.
Suppose we have a space of d -dimensional vectors, and h ( x, y ) denotes the Hamming dis-
Search WWH ::




Custom Search