Databases Reference
In-Depth Information
whenever p≥q.
Case 2: p < q. Here, the maximum size of the intersection is L s −i + 1, as
when su x length was not considered. However, the minimum size of the union
is now L s + j−1 + q−p. If we again use the relationship L s = i + p, we can
replace L s −p by i and get the formula i + j−1 + q for the size of the union.
If the Jaccard similarity is at least J , then
L s −i + 1
i + j−1 + q
≥J
whenever p < q.
Example 3.29 : Let us again consider the string s = acdefghijk , but to make
the example show some details, let us choose J = 0.8 instead of 0.9. We know
that L s = 10.
⌋+ 1 = 3, we must consider prefix positions
i = 1, 2, and 3 in what follows. As before, let p be the su x length of s and q
the su x length of t.
First, consider the case p≥q. The additional constraint we have on q and
j is (q + 1)/(9 + j)≥0.8. We can enumerate the pairs of values of j and q for
each i between 1 and 3, as follows.
Since⌊(1−J )L s
i = 1: Here, p = 9, so q≤9. Let us consider the possible values of q:
q = 9: We must have 10/(9 + j)≥0.8. Thus, we can have j = 1, j = 2,
or j = 3. Note that for j = 4, 10/13 > 0.8.
q = 8: We must have 9/(9 + j)≥0.8. Thus, we can have j = 1 or j = 2.
For j = 3, 9/12 > 0.8.
q = 7: We must have 8/(9 + j)≥0.8. Only j = 1 satisfies this inequality.
q = 6: There are no possible values of j, since 7/(9 + j) > 0.8 for every
positive integer j. The same holds for every smaller value of q.
i = 2: Here, p = 8, so we require q≤8. Since the constraint (q + 1)/(9 + j≥0.8
does not depend on i, 6 we can use the analysis from the above case, but
exclude the case q = 9. Thus, the only possible values of j and q when
i = 2 are
1. q = 8; j = 1.
2. q = 8; j = 2.
3. q = 7; j = 1.
i = 3: Now, p = 7 and the constraints are q≤7 and (q + 1)/(9 + j)≥0.8. The
only option is q = 7 and j = 1.
6 Note that i does influence the value of p, and through p, puts a limit on q.
Search WWH ::




Custom Search