Algorithms for Edit Based Similarity - Approximate String Processing

Database Reference

In-Depth Information

Table 8.1. Active nodes for each prefix of query

“Found” on the trie of Figure 5.1.

∅

F

Fo

Fou

Foun

Found

U, 1

Fu, 1

Fun, 1

Fund, 1

F,1

F,0

Fo,0

Fo,1

C, 1

Foo, 1

Fu, 1

Fe, 1

Fo, 1

Fe, 1

(hence active node

) = 0).

We complete set A 0 by adding all nodes corresponding to strings with

lengths in the interval [1 ,θ ]. These strings correspond to performing

from1upto θ arbitrary insertions to the empty string, and are still

within edit distance θ from v 0 .

The key observation now is that active set A i can be computed

directly from set A i− 1 . The recurrence relation can be derived from

the recurrence relation of the dynamic programming algorithm for edit

distance verification (Algorithm 2.1.1). The intuition is that we are

trying to incrementally compute the dynamic programming matrix of

Algorithm 2.1.1 for all strings contained in the trie at once (by virtue

of computing the edit distance between the query string and all active

paths of the trie).

Given query prefix v i , 1 ≤ i ≤|v| , we consider all strings with lengths

within the interval [ i

∅

is associated with edit distance

E

(

∅

,

∅

θ,i + θ ] (i.e., the L 0 -norm filter bounds for

v i ). A 0 corresponds to the first row of the dynamic programming

matrix. Every time we increase i we need to compute a new row in

the matrix. The new row is computed based on the values in the previ-

ous row. Assume that we are computing A i from A i− 1 for prefix v i . The

algorithm proceeds by examining every entry in set A i− 1 and taking

three actions:

−

(1) It determines whether an active node N should be added in

set A i by increasing its edit distance by one. This corresponds

to a deletion of character σ i (and value d 1 in Algorithm 2.1.1).

(2) It tests whether there exists a child node N of N , s.t.

σ ( N )= σ i . In that case N is added in A i with edit

Approximate String Processing

Search WWH ::

Custom Search

Home