Database Reference
In-Depth Information
S
S
is called a set median of
, neither the generalized median
nor the set median is necessarily unique. This definition was introduced by
Kohonen [20]. Note that different terminology has been used in the liter-
ature. In [13] the set median string and the generalized median string are
termed center string and Steiner string , respectively. In [24] the generalized
median was called consensus sequence.
Different possibility is mentioned by Kohonen [20] too. This is the par-
allel of mean from elementary statistics. Here we would like to search for
p that minimizes
.Foragivenset
d 2 (
p ,q
)
.
q∈S
Martinez-Hinarejos et al. [27] returned to this definition and investigated
the possibility of using mean instead of median.
Note that in a broad sense, the median string concept is related to the
center of gravity of a collection of masses. While the latter represents the
point where all the weight of the masses can be considered to be concen-
trated, the median string corresponds to a single representation of strings
based on a string distance.
Several string distance functions have been proposed in the literature.
The most popular one is doubtlessly the Levenshtein edit distance. Let
A
a 1 a 2 ···a n and
B
=
b 1 b 2 ···b m be two words over Σ. The Levenshtein
=
edit distance
) is defined in terms of elementary edit operations which
are required to transform
d
(
A, B
. Usually, three different types of edit
operations are considered, namely (1) substitution of a symbol
A
into
B
a ∈ A
by a
symbol
b ∈ B, a
=
b
, (2) insertion of a symbol
a ∈
Σin
B
, and (3) deletion
of a symbol
a ∈ A
. Symbolically, we write
a → b
for a substitution,
ε → a
for an insertion, and
for a deletion. To model the fact that some
distortions may be more likely than others, costs of edit operations,
a → ε
c
(
a →
b
)
,c
(
ε → a
), and
c
(
a → ε
), are introduced. Let
s
=
l 1 l 2 ···l k be a sequence
of edit operations transforming
A
into
B
. We define the cost of this sequence
)= i =1 c
by
c
(
s
(
l i ). Given two strings
A
and
B
, the Levenshtein edit
distance is given by
d
(
A, B
)=min
{c
(
s
)
| s
is a sequence of edit operations
transforming
A
into
B}.
To illustrate the Levenshtein edit distance, let us consider two words
A
=
median
and
B
=
mean
built on the English alphabet. Examples of
Search WWH ::




Custom Search