Database Reference
In-Depth Information
strings is inherently constructive. The theoretical results from Section 3
about computational complexity indicate the fundamental di
culties we
are faced with. In the following we describe various algorithms for comput-
ing generalized median strings. Not surprisingly, they are either of expo-
nential complexity or approximate.
5.1.
An Exact Algorithm and Its Variants
An algorithm for the exact computation of generalized median strings under
the Levenshtein distance is given in [21]. Let
ε
be the empty string and
Σ
=Σ
∪{ε}
the extended alphabet. We define:
δ
(
r
1
,r
2
,...,r
N
)=min
v∈
Σ
[
c
(
v → r
1
)+
c
(
v → r
2
)+
···
+
c
(
v → r
N
)]
.
δ
The operator
can be interpreted as a voting function, as it determines
v
the best value
at a given stage of computation. Finding an optimal value
requires an exhaustive search over Σ
in the most general case, but in
practice the cost function is often simple such that a shortcut can be taken
and the choice of the optimal
of
v
v
is not costly.
this way, the generalized median string can be com-
puted by means of dynamic programming in an
Having defined
δ
-dimensional array, sim-
ilarly to string edit distance computation [38]. For the sake of notational
simplicity, we only discuss the case
N
N
= 3. Assume the three input strings
be
u
1
u
2
···u
l
,
v
1
v
2
···v
m
,and
w
1
w
2
···w
n
. A three-dimensional distance
table of dimension
l × m × n
is constructed as follows:
Initialization:
d
0
,
0
,
0
=0;
Iteration:
d
i−
1
,j−
1
,k−
1
+
δ
u
i
,v
j
,w
k
)
(
d
i−
1
,j−
1
,k
+
δ
(
u
i
,v
j
,ε
)
d
i−
1
,j,k−
1
+
δ
(
u
i
,ε,w
k
)
0
≤ i ≤ l
d
i,j,k
=min
d
i−
1
,j,k
+
δ
(
u
i
,ε,ε
)
0
≤ j ≤ m
d
i,j−
1
,k−
1
+
δ
(
ε, v
j
,w
k
)
0
≤ k ≤ n
d
i,j−
1
,k
+
δ
(
ε, v
j
,ε
)
d
i,j,k−
1
+
δ
(
ε, ε, w
k
)
End:
if (
i
=
l
)
∧
(
j
=
m
)
∧
(
k
=
n
)
The computation requires
O
(
lmn
) time and space. The path in the
distance table that leads from
d
0
,
0
,
0
to
d
l,m,n
defines the generalized
median string ¯
p
with
d
l,m,n
being the consensus error
E
S
(¯
p
). Note that
a generalization to arbitrary
N
is straightforward. If the strings of
S
are