Database Reference
In-Depth Information
317
Proof. Let x = λ∈s∩v W ( λ ) be the weight of tokens in the intersec-
tion of strings s and v . Let y = λ∈s\v W ( λ ) be the weight of tokens
contained only in s and z = λ∈v\s W ( λ ) be the weight of tokens con-
tained only in v . Then
x
x + y + z .
J
( s, v )=
x
It suces to show that the function f ( x, y, z )=
x + y + z is monotone
increasing in x and monotone decreasing in y, for positive x, y, z . Then,
the proof is similar to that of Lemma 6.2 and 6.3.
x
Lemma 6.13. Function f ( x, y, z )=
x + y + z is monotone increasing in
x and monotone decreasing in y for all positive x, y, z .
Proof. Consider the function g ( x )=1 /f ( x ). Function f ( x ) is monotone
increasing iff g ( x ) is monotone decreasing.
g ( x )= x + y + z
x
=1+ y
x + z
x ,
which is clearly monotone decreasing in x for positive x, y, z . Also f ( y )
is monotone decreasing iff g ( y )=1 /f ( y ) is monotone increasing, which
is clearly true for positive x, y, z .
Lemma 6.14. Given query v , candidate string s , and a subset of list
L v
L v potentially containing s , the best-case similarity score for s is
v
1
b ( s, v )=
J
.
v
s
1 +
v
1
1
6.1.2.5
Weighted Intersection
Consider now Weighted Intersection similarity (i.e., without normal-
ization). We have seen how to evaluate weighted similarity using
the optimized multiway merge strategy. We now explore if the
threshold algorithms can be used effectively for the same purpose.
 
Search WWH ::




Custom Search