Database Reference
In-Depth Information
Ideally, the distance function is desired to be a metric, in order to match
the human intuition of similarity. The triangle inequality excludes the case
in which
) is very large. In prac-
tice, however, there may exist distance functions which do not satisfy the
triangle inequality. To judge the suitability of these distance functions, the
work [6] suggests quasi-metrics with a relaxed triangle inequality. Instead
of the strict triangle inequality, the relation:
d
(
p, r
)and
d
(
r, q
) are both small, but
d
(
p, q
d
(
p, q
)
d
(
p, r
)+
d
(
r, q
)
1+
ε
is required now. Here
is not
very large, the relaxed triangle inequality still retains the human intuition
of similarity. Note that the strict triangle inequality is a special case with
ε
ε
is a small nonnegative constant. As long as
ε
= 0. The fast set median search procedure [19] sketched above easily
extends to quasi-metrics. In this case the relationship (1) is replaced by:
max d
)
(
p, r
)
, d
(
q, r
)
d
(
p, q
)
− d
(
q, r
)
− d
(
p, r
,
∀p, q, r ∈ S
1+
ε
1+
ε
which can be used in the same manner to establish a lower bound
g
(
p
).
4.2. Approximate Set Median Search in Arbitrary Spaces
Another approach to fast set median search makes no assumption on the
distance function and covers therefore non-metrics as well. The idea of this
approximate algorithm is simple. Instead of computing the sum of distances
of each string to all the other strings of
S
to select the best one, only a
subset of
is used to obtain an estimation of the consensus error [30]. The
algorithm first calculates such estimations and then calculates the exact
consensus errors only for strings that have low estimations.
This approximate algorithm proceeds in two steps. First, a random sub-
S
set
S r of
N r strings is selected from
S
.Foreachstring
p
of
S
, the consensus
error
E S r (
p
) relative to
S r
is computed and serves as an estimation of the
consensus error
N t strings with the lowest con-
sensus error estimations are chosen. The exact consensus error
E S (
p
). In the second step
E S (
p
)is
computed for the
N t
strings and the string with the minimum
E S (
p
)is
regarded the (approximate) set median string of
S
.
5. Computation of Generalized Median Strings
While the set median problem is characterized by selecting one particular
member out of a given set of strings, the computation of generalized median
 
Search WWH ::




Custom Search