Database Reference
In-Depth Information
improve the minimum destroy operations bound by first computing
the number of operations needed to destroy the q -grams in Q s→r and
then in Q r→s . Then, we take the larger of the two to be the minimum
destroy operations of the mismatching q -grams of pair s, r .
An optimal greedy algorithm for computing the minimum destroy
operations with cost linear in the number of q -grams is shown as Algo-
rithm 8.2.1. The algorithm selects the next unprocessed q -gram and
makes a conceptual replacement of its last character, in essence destroy-
ing the current q -gram and all subsequent q -grams incident on this
character. The algorithm iterates until no more q -grams are left to
destroy.
Algorithm 8.2.1: Minimum Destroy Operations( Q )
Let Q =
{
( λ 1 ,p 1 ) ,..., ( λ ,p )
}
s.t. p i <p j ,
i<j
e
0 ,p
0
for all ( λ i ,p i )
Q
if p i >p
then e
do
e +1
p
p i + q
1
return e
Given query string v , the prefix filter principle dictates that a string
s is a viable candidate if and only if the prefixes of v and s have a
non-empty intersection. Given that the prefixes have a fixed size of
+1 q -grams, in the worst case the prefix of a candidate can have
as many as mismatching q -grams. This yields ample opportunity
for the mismatch filter to conclude that there are enough mismatching
q -grams within the prefixes to prune the candidate.
In fact, we can strengthen the mismatch filter pruning condition
even further by computing the minimum number of mismatching q -
grams within the prefix of a string that will guarantee an edit distance
larger than θ . The length of the prefix containing these q -grams can
potentially be smaller than + 1, depending on how much overlap
the q -grams in the shorter prefix have. We call the shorter prefix the
 
Search WWH ::




Custom Search