Database Reference
In-Depth Information
3.6.4
Exercises for Section 3.6
EXERCISE
3.6.1 What is the effect on probability of starting with the family of minhash
functions and applying:
(a) A 2-way AND construction followed by a 3-way OR construction.
(b) A 3-way OR construction followed by a 2-way AND construction.
(c) A 2-way AND construction followed by a 2-way OR construction, followed by a
2-way AND construction.
(d) A 2-way OR construction followed by a 2-way AND construction, followed by a
2-way OR construction followed by a 2-way AND construction.
EXERCISE
3.6.2 Find the fixedpoints for each of the functions constructed in
Exercise 3.6.1
.
by the derivative of the function. The maximum slope is where that derivative is a maxim-
um. Find the value of
p
that gives a maximum slope for the S-curves given by
Fig. 3.10
and
maximum slope and the value of that slope, for families of functions defined from the min-
hash functions by:
(a) An
r
-way AND construction followed by a
b
-way OR construction.
(b) A
b
-way OR construction followed by an
r
-way AND construction.
3.7 LSH Families for Other Distance Measures
There is no guarantee that a distance measure has a locality-sensitive family of hash func-
tions. So far, we have only seen such families for the Jaccard distance. In this section, we
shall show how to construct locality-sensitive families for Hamming distance, the cosine
distance and for the normal Euclidean distance.
3.7.1
LSH Families for Hamming Distance
It is quite simple to build a locality-sensitive family of functions for the Hamming distance.
Suppose we have a space of
d
-dimensional vectors, and
h
(
x, y
) denotes the Hamming dis-