Database Reference
In-Depth Information
2. The Problem
The problem of retrieving similar time sequences may be stated as follows:
Given a sequence
q
, a set of time sequences
X
, a (non-negative) distance
measure
d
,anda tolerance threshold
ε
, find the set
R
of sequences closer
to
q
than
ε
, or, more precisely:
R
{x ∈ X|d
q, x
≤ ε}.
=
(
)
(3)
Alternatively, one might wish to find the
k
nearest neighbours of
q
,which
amounts to setting
ε
so that
|R|
=
k
. The parameter
ε
is typically supplied
by the user, while the distance function
is domain-dependent. Several
distance measures will be described rather informally in this chapter. For
more formal definitions, see Appendix.
Figure 1 illustrates the problem for Euclidean distance in two dimen-
sions. In this example, the vector
d
x
will be included in the result set
R
,
while
will not.
A useful variation of the problem is to find a set of subsequences of the
sequences in X . This, in the basic case, requires comparing
y
q
not only to
, but to all possible subsequences. 2
If a method retrieves a subset
all elements of
X
S
of
R
, the wrongly dismissed sequences
in
R − S
are called false dismissals .Conversely,if
S
is a superset of
R
,the
sequences in
S − R
are called false alarms.
Fig. 1.
Similarity retrieval.
2 Except in the description of LCS in Appendix, subsequence means contiguous subse-
quence ,or segment.
 
Search WWH ::




Custom Search