A SURVEY OF RECENT METHODS FOR EFFICIENT RETRIEVAL OF SIMILAR TIME SEQUENCES - Data Mining in Time Series Databases - page 29

Database Reference

In-Depth Information

An important result given by Faloutsos et al. (1994) is the proof that in

order to guarantee completeness (no false dismissals), the distance function

used in the signature space must underestimate the true distance mea-

sure, or:

d k (˜

x,

˜

y

)

≤ d

(

x, y

)

.

(4)

This requirement is called the bounding lemma. Assuming that (1.4)

holds, an intuitive way of stating the resulting situation is: “if two signa-

tures are far apart, we know the corresponding [sequences] must also be far

apart” [Faloutsos et al. (1997)]. This, of course, means that there will be

no false dismissals. To minimise the number of false alarms, we want

d k to

approximate

d

as closely as possible. The bounding lemma is illustrated in

Figure 3.

This general method of dimensionality reduction may be summed up as

follows [Keogh et al. (2001b)]:

1. Establish a distance measure

from a domain expert.

2. Design a dimensionality reduction technique to produce signatures of

length

d

k

,where

k

can be eciently handled by a standard spatial access

method.

3. Produce a distance measure

-dimensional signature space,

and prove that it obeys the bounding condition (4).

d k

over the

k

In some applications, the requirement in (4) is relaxed, allowing for a

small number of false dismissals in exchange for increased performance.

Such methods are called approximate.

The dimensionality reduction may in itself be used to speed up the

sequential scan, and some methods (such as the piecewise linear approxi-

mation of Keogh et al. ,whichisdescribedinSection4.2)relyonlyonthis,

without using any index structure.

Fig. 3.

An intuitive view of the bounding lemma.

Next Page

Data Mining in Time Series Databases

Search WWH ::

Custom Search

Home