Database Reference
In-Depth Information
317
Proof.
Let
x
=
λ∈s∩v
W
(
λ
) be the weight of tokens in the intersec-
tion of strings
s
and
v
. Let
y
=
λ∈s\v
W
(
λ
) be the weight of tokens
contained only in
s
and
z
=
λ∈v\s
W
(
λ
) be the weight of tokens con-
tained only in
v
. Then
x
x
+
y
+
z
.
J
(
s, v
)=
x
It suces to show that the function
f
(
x, y, z
)=
x
+
y
+
z
is monotone
increasing in
x
and monotone decreasing in y, for positive
x, y, z
. Then,
the proof is similar to that of Lemma 6.2 and 6.3.
x
Lemma 6.13.
Function
f
(
x, y, z
)=
x
+
y
+
z
is monotone increasing in
x
and monotone decreasing in
y
for all positive
x, y, z
.
Proof.
Consider the function
g
(
x
)=1
/f
(
x
). Function
f
(
x
) is monotone
increasing iff
g
(
x
) is monotone decreasing.
g
(
x
)=
x
+
y
+
z
x
=1+
y
x
+
z
x
,
which is clearly monotone decreasing in
x
for positive
x, y, z
. Also
f
(
y
)
is monotone decreasing iff
g
(
y
)=1
/f
(
y
) is monotone increasing, which
is clearly true for positive
x, y, z
.
Lemma 6.14.
Given query
v
, candidate string
s
, and a subset of list
L
v
⊆
L
v
potentially containing
s
, the best-case similarity score for
s
is
v
1
b
(
s, v
)=
J
.
v
s
1
+
v
−
1
1
6.1.2.5
Weighted Intersection
Consider now Weighted Intersection similarity (i.e., without normal-
ization). We have seen how to evaluate weighted similarity using
the
optimized multiway merge
strategy. We now explore if the
threshold
algorithms can be used effectively for the same purpose.