Database Reference
In-Depth Information
If the new string read is not already contained in M , a new
entry is created and reported as an answer or inserted in M ,
using reasoning similar to the previous step.
After one round robin iteration, the bit vector of each candidate in the
candidate set is updated, and candidates with fully set bit vectors are
reported as answers or evicted from the candidate set accordingly.
The algorithm can terminate early if two conditions are met. First
the candidate set is empty, which means no more viable candidates
whose similarity has not been completely computed yet exist. Second,
the maximum possible score of a conceptual frontier string that appears
at the current position in all lists cannot exceed the query threshold.
Lemma 6.2. Let L a
L v be the set of active lists. The terminating
condition
L ( λ i ) ∈L a W ( λ i )
max(min L ( λ i ) ∈L a
f =
N
1 ) <θ,
f i 1 ,
v
does not lead to any false dismissals.
Proof. The terminating condition leads to no false dismissals if and
only if the maximum possible normalized weighted intersection of any
unseen element is smaller than θ . To maximize
( s, v ) we need to
maximize the numerator and minimize the denominator.
After each round of the nra algorithm, let the frontier element of
each list (i.e., the last element read on that list) be f i , 1
N
i
m . Each
frontier element corresponds to a string f i with L 1 -norm
f i 1 . Let
L a
L v be the set of active lists (i.e., lists that have not been fully
traversed yet).
A conceptual frontier element that has not been seen yet has a pos-
sible maximum weighted intersection of L ( λ i ) ∈L a W ( λ i ). To minimize
the denominator, notice that
N
( s, v ) is monotone decreasing in
s
1 .
Hence, for strings s, r s.t.
s
1 <
r
1 and
s
v
1 =
r
v
1 it holds
that
N
( s, v )
≥N
( r, v ). Thus, the maximum possible similarity of an
 
Search WWH ::




Custom Search