Database Reference
In-Depth Information
strings is inherently constructive. The theoretical results from Section 3
about computational complexity indicate the fundamental di culties we
are faced with. In the following we describe various algorithms for comput-
ing generalized median strings. Not surprisingly, they are either of expo-
nential complexity or approximate.
5.1. An Exact Algorithm and Its Variants
An algorithm for the exact computation of generalized median strings under
the Levenshtein distance is given in [21]. Let
ε
be the empty string and
Σ
∪{ε}
the extended alphabet. We define:
δ
(
r 1 ,r 2 ,...,r N )=min
v∈ Σ
[
c
(
v → r 1 )+
c
(
v → r 2 )+
···
+
c
(
v → r N )]
.
δ
The operator
can be interpreted as a voting function, as it determines
v
the best value
at a given stage of computation. Finding an optimal value
requires an exhaustive search over Σ in the most general case, but in
practice the cost function is often simple such that a shortcut can be taken
and the choice of the optimal
of
v
v
is not costly.
this way, the generalized median string can be com-
puted by means of dynamic programming in an
Having defined
δ
-dimensional array, sim-
ilarly to string edit distance computation [38]. For the sake of notational
simplicity, we only discuss the case
N
N
= 3. Assume the three input strings
be
u 1 u 2 ···u l ,
v 1 v 2 ···v m ,and
w 1 w 2 ···w n . A three-dimensional distance
table of dimension
l × m × n
is constructed as follows:
Initialization:
d 0 , 0 , 0 =0;
Iteration:
d i− 1 ,j− 1 ,k− 1 +
δ
u i ,v j ,w k )
(
d i− 1 ,j− 1 ,k
+
δ
(
u i ,v j
)
d i− 1 ,j,k− 1
+
δ
(
u i ,ε,w k )
0
≤ i ≤ l
d i,j,k =min
d i− 1 ,j,k
+
δ
(
u i ,ε,ε
)
0
≤ j ≤ m
d i,j− 1 ,k− 1
+
δ
(
ε, v j ,w k )
0
≤ k ≤ n
d i,j− 1 ,k
+
δ
(
ε, v j
)
d i,j,k− 1
+
δ
(
ε, ε, w k )
End: if (
i
=
l
)
(
j
=
m
)
(
k
=
n
)
The computation requires
O
(
lmn
) time and space. The path in the
distance table that leads from
d 0 , 0 , 0
to
d l,m,n
defines the generalized
median string ¯
p
with
d l,m,n
being the consensus error
E S
p
). Note that
a generalization to arbitrary
N
is straightforward. If the strings of
S
are
Search WWH ::




Custom Search