Database Reference
In-Depth Information
Pairwise Distances
Seq1
Seq2
Seq3
Query
Seq1
Seq2
Seq3
0
20
110
20
0
90
110
90
0
Seq 1
Seq 2
Seq 3
Fig. 3.
Pruning power of triangle inequality.
Seq
3, and we have already recorded their pair-wise distances in a
table. Suppose we pose a query
2
,
and
Seq
Q
, and we want to find the closest sequence
to
. Assume we have already found a sequence, best-match-so-far ,whose
distance from
Q
Q
is 20. Comparing the query sequence
Q
to Seq 2 yields
D
(
Q, Seq
2) = 150. However, because
D
( Seq 2, Seq 1) = 20 and it is true that:
D
20 = 130
we can safely prune Seq1 , since it is not going to offer a better solution.
Similarly, we can prune Seq 3.
We are going to use non-metric distance functions. This choice imposes
diculties on how to prune data in an index, since we cannot apply the
technique outlined above. As a result it is much more dicult to design
ecient indexing techniques for non-metric distance functions. In the next
section we show that despite this fundamental drawback it is important to
use non-metric distance functions in conditions of noise. In section 5 we will
also show that the proper choice of a distance measure can overcome this
drawback to some degree.
(
Q
, Seq 1)
≥ D
(
Q
, Seq 2)
− D
( Seq 2, Seq 1)
⇒ D
(
Q
, Seq 1)
150
2.3. Motivation for Non-Metric Distance Functions
Distance functions that are robust to extremely noisy data will typically
violate the triangular inequality. These functions achieve this by not con-
sidering the most dissimilar parts of the objects. Moreover, they are useful,
because they represent an accurate model of the human perception, since
when comparing any kind of data (images, time-series etc), we mostly focus
Search WWH ::




Custom Search