Database Reference
In-Depth Information
1 ≤S θ
(2) Let
s
r
( r )
1 .
We
prove
by
contradiction.
Assume that
θ
θ +1 (
J
( s, r )
θ
s
r
s
1 +
r
1 ) .
1
By definition
≤S θ ( r )
s
r
s
r
1
r
1 .
1
1
Thus
θ
θ +1 (
s
1 +
r
1 )
r
s
1
r
1 .
1
But, from Lemma 6.11
J
1 ,
which is a contradiction. Hence J ( s, r ) .
( s, r )
θ
s
θ
r
1
Lemma 7.5 states that if we know that strings are processed in
increasing order of L 1 -norms, then for every string indexed, we only
need to index a reduced prefix of the string. Once a new string r is
processed, it is guaranteed to have L 1 -norm larger than or equal to
all strings s already indexed, hence we can probe the index using the
prefix P θ ( r )of r to find all candidate pairs using the reduced prefix
principle.
It is easy to see that the index reduction principle does not apply to
Weighted Intersection and Normalized Weighted Intersection similarity.
In addition, it can be shown that it does not help reduce the prefixes
for Dice similarity. On the other hand it can help reduce the prefixes
for Cosine similarity.
Definition 7.5 (Cosine Reduced Prefix). The reduced prefix of
string s is defined as
P θ ( s ).
The following is true:
Lemma 7.6. Given two strings s, r s.t.
2 , the reduced prefix
P θ ( s )of s and the prefix P θ ( r )of r , it holds that
C
s
r
2
⇒P θ ( s )
∩P θ ( r )
( s, r )
θ
=
.
 
Search WWH ::




Custom Search