Database Reference
In-Depth Information
3.9.7
Exercises for Section 3.9
EXERCISE 3.9.1 Suppose our universal set is the lower-case letters, and the order of ele-
ments is taken to be the vowels, in alphabetic order, followed by the consonants in reverse
alphabetic order. Represent the following sets as strings.
(a) { q, w, e, r, t, y }.
(b) { a, s, d, f, g, h, j, u, i }.
EXERCISE 3.9.2 Suppose we filter candidate pairs based only on length, as in Section 3.9.3 .
If s is a string of length 20, with what strings is s compared when J , the lower bound on
Jaccard similarity has the following values: (a) J = 0 . 85 (b) J = 0 . 95 (c) J = 0 . 98?
EXERCISE 3.9.3 Suppose we have a string s of length 15, and we wish to index its prefix as
in Section 3.9.4 .
(a) How many positions are in the prefix if J = 0 . 85?
(b) How many positions are in the prefix if J = 0 . 95?
! (c) For what range of values of J will s be indexed under its first four symbols, but no
more?
EXERCISE 3.9.4 Suppose s is a string of length 12. With what symbol-position pairs will s
be compared with if we use the indexing approach of Section 3.9.5 , and (a) J = 0 . 75 (b) J
= 0 . 95?
! EXERCISE 3.9.5 Suppose we use position information in our index, as in Section 3.9.5 .
Strings s and t are both chosen at random from a universal set of 100 elements. Assume J
= 0 . 9. What is the probability that s and t will be compared if
(a) s and t are both of length 9.
(b) s and t are both of length 10.
EXERCISE 3.9.6 Suppose we use indexes based on both position and suffix length, as in
Section 3.9.6 . If s is a string of length 20, with what symbol-position-length triples will s
be compared with, if (a) J = 0 . 8 (b) J = 0 . 9?
Search WWH ::




Custom Search