Database Reference
In-Depth Information
the mistake of assuming the average difference in publication date between random pairs
is 5 or 5.5. You need to calculate it exactly, and you have enough information to do so.
EXERCISE 3.8.2 Suppose we use the family F of functions described in Section 3.8.5 , where
there is a 20% chance of a minutia in an grid square, an 80% chance of a second copy of a
fingerprint having a minutia in a grid square where the first copy does, and each function
in F being formed from three grid squares. In Example 3.22 , we constructed family F 1 by
using the OR construction on 1024 members of F . Suppose we instead used family F 2 that
is a 2048-way OR of members of F .
(a) Compute the rates of false positives and false negatives for F 2 .
(b) How do these rates compare with what we get if we organize the same 2048 functions
into a 2-way AND of members of F 1 , as was discussed at the end of Section 3.8.5 ?
EXERCISE 3.8.3 Suppose fingerprints have the same statistics outlined in Exercise 3.8.2 ,
but we use a base family of functions F ′ defined like F , but using only two randomly chosen
grid squares. Construct another set of functions from F ′ by taking the n -way OR of
functions from F ′. What, as a function of n , are the false positive and false negative rates
for
EXERCISE 3.8.4 Suppose we use the functions F 1 from Example 3.22 , but we want to solve
the many-many problem.
(a) If two fingerprints are from the same finger, what is the probability that they will not
be compared (i.e., what is the false negative rate)?
(b) What fraction of the fingerprints from different fingers will be compared (i.e., what is
the false positive rate)?
! EXERCISE 3.8.5 Assume we have the set of functions F as in Exercise 3.8.2 , and we con-
struct a new set of functions F 3 by an n -way OR of functions in F . For what value of n is
the sum of the false positive and false negative rates minimized?
3.9 Methods for High Degrees of Similarity
LSH-based methods appear most effective when the degree of similarity we accept is relat-
ively low. When we want to find sets that are almost identical, there are other methods that
can be faster. Moreover, these methods are exact, in that they find every pair of items with
the desired degree of similarity. There are no false negatives, as there can be with LSH.
Search WWH ::




Custom Search