Database Reference
In-Depth Information
complexity is supported by the following theoretical results. Under the two
conditions:
every edit operation has cost one, i.e.,
c
(
a → b
)=
c
(
ε → a
)=
c
(
a → ε
)=1,
the alphabet is not of fixed size.
It is proved in [15] that computing the generalized median string is
NP-hard. Sim and Park [35] proved that the problem is NP-hard for finite
alphabet and for a metric distance matrix. Another result comes from com-
putational biology. The optimal evolutionary tree problem there turns out
to be equivalent to the problem of computing generalized median strings
if the tree structure is a star (a tree with
of them being
leaves). In [39] it is proved that in this particular case the optimal evolu-
tionary tree problem is NP-hard. The distance function used is problem
dependent and does not even satisfy the triangle inequality. All these theo-
retical results indicate the inherent diculty in finding generalized median
strings. Not surprisingly, the algorithms we will discuss in Section 5 are
either exponential or approximate.
n
+ 1 nodes,
n
4. Fast Computation of Set Median Strings
N 2 ) distance computa-
tions. Considering the relatively high computational cost of each individ-
ual string distance, this straightforward approach may not be appropriate,
especially in the case of a large number of strings. The problem of fast set
median search can be tackled by making use of properties of metric distance
functions or developing approximate algorithms. Several solutions [19,30]
have been suggested for fast set median search in arbitrary spaces. They
apply to the domain of strings as well.
The naive computation of set median requires
O
(
4.1. Exact Set Median Search in Metric Spaces
In many applications the underlying string distance function is a metric
which satisfies:
(i)
d
(
p, q
)
0and
d
(
p, q
) = 0 if and only if
p
=
q
,
(ii)
d
(
p, q
)=
d
(
q, p
),
(iii)
d
(
p, q
)+
d
(
q, r
)
≥ d
(
p, r
).
Search WWH ::




Custom Search