A SURVEY OF RECENT METHODS FOR EFFICIENT RETRIEVAL OF SIMILAR TIME SEQUENCES - Data Mining in Time Series Databases

Database Reference

In-Depth Information

2. The Problem

The problem of retrieving similar time sequences may be stated as follows:

Given a sequence

q

, a set of time sequences

X

, a (non-negative) distance

measure

d

,anda tolerance threshold

ε

, find the set

R

of sequences closer

to

q

than

ε

, or, more precisely:

R

{x ∈ X|d

q, x

≤ ε}.

=

(

)

(3)

Alternatively, one might wish to find the

k

nearest neighbours of

q

,which

amounts to setting

ε

so that

|R|

=

k

. The parameter

ε

is typically supplied

by the user, while the distance function

is domain-dependent. Several

distance measures will be described rather informally in this chapter. For

more formal definitions, see Appendix.

Figure 1 illustrates the problem for Euclidean distance in two dimen-

sions. In this example, the vector

d

x

will be included in the result set

R

,

while

will not.

A useful variation of the problem is to find a set of subsequences of the

sequences in X . This, in the basic case, requires comparing

y

q

not only to

, but to all possible subsequences. 2

If a method retrieves a subset

all elements of

X

S

of

R

, the wrongly dismissed sequences

in

R − S

are called false dismissals .Conversely,if

S

is a superset of

R

,the

sequences in

S − R

are called false alarms.

Fig. 1.

Similarity retrieval.

2 Except in the description of LCS in Appendix, subsequence means contiguous subse-

quence ,or segment.

Data Mining in Time Series Databases

Search WWH ::

Custom Search

Home