Database Reference
In-Depth Information
For the upper bound:
It holds that s ∩ r 2 ≤r 2 . Hence,
2 ) 2
s 2 r 2
2 ) 2
s 2 r 2
s
r
r
r
(
(
1
C
( s, r )
θ
θ
θ
s
.
2
θ
In addition
Observation 6.2. Given a string s with L 2 -norm
s
2 and the L 2 -
norms of all frontier elements
2 , we can immediately deduce
whether s potentially appears in list L ( λ i ) or not by a simple com-
parison: If
f i
2 and s has not been encountered in list L ( λ i )
yet, then s does not appear in L ( λ i ) (given that lists are sorted in
increasing order of L 2 -norms).
s
2 <
f i
Hence
Lemma 6.9. Let L a ⊆ L v be the set of active lists. The terminating
condition
L ( λ j ) L a : f j 2 f i 2 W ( λ j ) 2
f =
C
max
L ( λ i
<θ,
f i
v
) ∈L a
2
2
does not lead to any false dismissals.
Lemma 6.10. Given query v , candidate string s , and a subset of lists
L v
L v potentially containing s , the best-case similarity score for s is
b ( s, v )= ( v 2 ) 2
C
s 2 v 2 .
The actual algorithms in principle remain the same.
6.1.2.4
Jaccard
Jaccard similarity presents some diculties. Notice that Jaccard is a
monotone aggregate function with partial similarity α i ( s, v )= W ( λ i )
1 .
s
v
Search WWH ::




Custom Search