Database Reference
In-Depth Information
improve the minimum destroy operations bound by first computing
the number of operations needed to destroy the
q
-grams in
Q
s→r
and
then in
Q
r→s
. Then, we take the larger of the two to be the minimum
destroy operations of the mismatching
q
-grams of pair
s, r
.
An optimal greedy algorithm for computing the minimum destroy
operations with cost linear in the number of
q
-grams is shown as Algo-
rithm 8.2.1. The algorithm selects the next unprocessed
q
-gram and
makes a conceptual replacement of its last character, in essence destroy-
ing the current
q
-gram and all subsequent
q
-grams incident on this
character. The algorithm iterates until no more
q
-grams are left to
destroy.
Algorithm 8.2.1:
Minimum Destroy Operations(
Q
)
Let
Q
=
{
(
λ
1
,p
1
)
,...,
(
λ
,p
)
}
s.t.
p
i
<p
j
,
∀
i<j
e
←
0
,p
←
0
for all
(
λ
i
,p
i
)
∈
Q
if
p
i
>p
then
e
do
←
e
+1
p
←
p
i
+
q
−
1
return
e
Given query string
v
, the prefix filter principle dictates that a string
s
is a viable candidate if and only if the prefixes of
v
and
s
have a
non-empty intersection. Given that the prefixes have a fixed size of
qθ
+1
q
-grams, in the worst case the prefix of a candidate can have
as many as
qθ
mismatching
q
-grams. This yields ample opportunity
for the mismatch filter to conclude that there are enough mismatching
q
-grams within the prefixes to prune the candidate.
In fact, we can strengthen the mismatch filter pruning condition
even further by computing the minimum number of mismatching
q
-
grams within the prefix of a string that will guarantee an edit distance
larger than
θ
. The length of the prefix containing these
q
-grams can
potentially be smaller than
qθ
+ 1, depending on how much overlap
the
q
-grams in the shorter prefix have. We call the shorter prefix the