Database Reference
In-Depth Information
1 ,s = σ 1
length
σ . Finally, a replacement R ( s, i, σ )
replaces the character σ i with character σ
···
σ i− 1 σ i +1
···
Σ resulting in string
s = σ 1 ···
σ i− 1 σσ i +1 ···
σ , of the same length .
Definition 2.1 (Edit Distance). The edit distance
( s, r ) between
two strings s, r is the minimum number of primitive operations (i.e.,
insertions, deletions, and replacements) needed to transform s to r .
E
Edit distance is also known as Levenshtein distance.
For example, let s ='1 Laptop per Child' and r = 'One Lap-
top per Child'. Then
( s, r ) = 3 since the sequence of operations
I ( s, 1 , ' n ') ,I ( s, 1 , ' O ') ,R ( s, 3 , ' e ') results in string r , and there is no
shorter sequence of operations that can transform s into r .
Clearly, edit distance is symmetric. The minimum edits needed to
transform r into s is the inverse of that needed to transform s into
r ( R ( r, 3 , '1') ,D ( r, 1) ,D ( r, 1) in this example). In addition, edit distance
satisfies the triangular inequality, i.e.,
E
( t, r ). This is
easy to show using induction. Hence, edit distance is a metric.
The edit distance between two strings can be computed using
dynamic programming in O (
E
( s, r )
≤E
( s, t )+
E
2 ) time for strings of
length |s| , while the most ecient known algorithm requires O ( |s| )
space and O (
|
s
|
) space and O (
|
s
|
2 /log
) time instead [52]. Clearly computing the edit
distance between strings is a very expensive operation. However, if the
goal is to test whether the edit distance between two strings is within
some threshold θ , then there exists a verification algorithm that runs
in O ( θ ) space and O ( θ
|
s
|
|
s
|
) time, which is very ecient for small thresh-
olds θ . The idea is once more based on dynamic programming, but the
verification algorithm tests only the entries with offset no more than
θ from the diagonal in the dynamic programming matrix. The algo-
rithm is shown as Algorithm 2.1.1, and a sample execution between the
strings '1 Laptop' and 'One Laptop' with θ = 2 is shown in Table 2.1.
The computation of the actual edit distance between the two strings
(as opposed to simply verifying whether the edit distance is smaller
than 2) could, in the worst case, require the full computation of the
dynamic programming matrix, as opposed to the constrained verifica-
tion algorithm.
|
s
|
 
Search WWH ::




Custom Search