Database Reference
In-Depth Information
positions, can be applied over the existing inverted index based algo-
rithms. It holds that
s = { ( λ 1 ,p 1 ) ,..., ( λ s m ,p s m ) }
r =
Lemma 8.2.
Given
strings
and
( λ 1 ,p 1 ) ,..., ( λ n ,p n )
{
}
, let the matching set of positional q -grams be
r = {
( λ i ,p i ) , ( λ j ,p j )
s
}
. Then
λ i j
p i
p j |≤
E
( s, r )
θ
⇒∀
s
r :
|
θ.
θ and that a q -gram λ i
Proof. Assuming that
E
( s, r )
s matches
q -gram λ j
p i
p j |
, clearly we need at least θ + 1 edit
operations to transform the position of λ i into that of λ j , without
affecting any other q -grams, to obtain one q -gram set from the other.
This is a contradiction.
r with
|
The claim states that if two strings are within edit distance θ , then
the positions of the matching q -grams between the two strings cannot
differ by more than θ .
We can use Lemma 8.2 when scanning inverted lists to prune
candidates based on positional information of q -grams, by comparing
against the positions of these q -grams in the query string. Positional
information can be used for all algorithms discussed in Section 6. In
the algorithms, while scanning a list corresponding to a query q -gram
with position p , all strings whose positions for that q -gram differ by
more than θ from p can be ignored. A string is considered a viable
candidate if and only if the resulting q -gram intersection with the
query exceeds threshold τ .
Finally, we can state a simple length filter for edit distance, which
can be used in all algorithms for pruning.
Lemma 8.3 (Edit Distance L 0 -norm Filter). Given strings s, r
and edit distance threshold θ
E
( s, r )
θ
⇒||
s
|−|
r
|| ≤
θ.
 
Search WWH ::




Custom Search