Algorithms for Edit Based Similarity - Approximate String Processing

Database Reference

In-Depth Information

Algorithm 8.3.1: Edit Distance on Trie( v,θ )

Let v = σ 1

σ |v|

A i is a hash table of pairs ( N,e ) with key N

A 0 ←

···

, 0)

for i =1 to θ

(

∅

do for each node N at distance i from root: A 0 ←

( N,i )

for i =1 to |v|

for all ( N,e )

∈

A i− 1

if e +1

≤

// Case 1

if ( N,e )

( N, min( e +1 ,e ))

∈

A i : A i ←

( N,e +1)

for all children N of N

else A i

←

if σ ( N )= σ i

// Case 2

if ( N ,e )

( N , min( e, e ))

∈

A i : A i

←

( N ,e )

for j =1 to θ

else A i ←

−

for all descendants N of N ,j characters

away

do A i

( N ,θ

←

−

e + j )

else if e +1

≤

// Case 3

do if ( N ,e ) ∈ A i : A i ← ( N , min( e +1 ,e ))

else A i

( N ,e +1)

←

return A |v|

A drawback of the trie based algorithm is that it will not scale

for large edit distance thresholds since it will degenerate fast into a

complete traversal of the higher levels of the trie structure. A plethora

of advanced trie based algorithms for reducing the number of paths

traversed have been proposed in the past. The algorithms assume

complete knowledge of the query a priori. The same algorithms can

be straightforwardly modified to work on sux trees. Finally, it is

not hard to see that these techniques can be adapted for computing

weighted edit distance (given a substitution matrix and a cost function

for each edit operation).

Approximate String Processing

Search WWH ::

Custom Search

Home