Database Reference
In-Depth Information
A
B
sequences of edit operations transforming
into
are:
• s 1 =
d → a,
i → n,
a → ε,
n → ε
,
• s 2 =
d → a,
i → ε,
a → ε
,
• s 3 =
d → ε,
i → ε
.
s 3
represents the optimal sequence with the minimum total cost 2 for trans-
forming median into mean among all possible transformations. Therefore,
we observe
Under the edit cost
c
(
a → ε
)=
c
(
ε → a
)=
c
(
a → b
)=1,
a
=
b
,
)=2.
In [38] an algorithm is proposed to compute the Levenshtein edit dis-
tance by means of dynamic programming. Other algorithms are discussed in
[13,34,36]. Further string distance functions are known from the literature,
for instance, normalized edit distance [28], maximum posterior probability
distance [20], feature distance [20], and others [8]. The Levenshtein edit
distance is by far the most popular one. Actually, some of the algorithms
we discuss later are tightly coupled to this particular distance function.
d
(
median, mean
3. Theoretical Results
In this section we summarize some theoretical results related to median
strings. The generalized median is a more general concept and usually a
better representation of the given strings than the set median. From a prac-
tical point of view, the set median can be regarded an approximate solution
of the generalized median. As such it may serve as the start for an iterative
refinement process to find more accurate approximations. Interestingly, we
have the following result (see [13] for a proof):
Theorem 1. Assume that the string distance function satisfies the triangle
inequality. Then
E S
p
)
/E S
p
)
2
2
/|S|
.
That is, the set median has a consensus error relative to
S
that is at
most 2
times the consensus error of the generalized median string.
Independent of the distance function we can always find the set median
2
/|S|
strings by means of 2 N
of
1) pairwise distance computations. This
computational burden can be further reduced by making use of special
properties of the distance function (e.g. metric) or resorting to approximate
procedures. Section 4 will present examples of these approaches.
Compared to set median strings, the computation of generalized median
strings represents a much more demanding task. This is due to the
huge search space which is substantially larger than that for determining
the set median string. This intuitive understanding of the computational
N
(
N −
Search WWH ::




Custom Search