A SURVEY OF RECENT METHODS FOR EFFICIENT RETRIEVAL OF SIMILAR TIME SEQUENCES - Data Mining in Time Series Databases

Database Reference

In-Depth Information

similar to the PCA, except that the segments need not be of equal length.

Thus regions with great fluctuations may be represented with several short

segments, while reasonably featureless regions may be represented with

fewer, long segments. The main contribution of this representation is that

it is a more effective compression than the PCA, while still representing the

original faithfully.

Two distance measures are developed for the APCA, one which is guar-

anteed to underestimate Euclidean distance, and one which can be cal-

culated more eciently, but which may generate some false dismissals. It

is also shown that this technique, like the PCA, can handle arbitrary

L p

norms. The empirical data suggest that the APCA outperforms both meth-

ods based on the discrete Fourier transform, and methods based on the dis-

crete wavelet transform with a speedup of one to two orders of magnitude.

In a recent paper, Keogh (2002) develops a distance measure that is

a lower bound for dynamic time warping, and uses the PCA approach to

index it. The distance measure is based on the assumption that the allowed

warping is restricted, which is often the case in real applications. Under this

assumption, Keogh constructs two warped versions of the sequence to be

indexed: An upper and a lower limit. The PCA signatures of these limits are

then extracted, and together with Keogh's distance measure form an exact

index (one with no false dismissals) with high precision. Keogh performs

extensive empirical experiments, and his method clearly outperforms any

other existing method for indexing time warping.

3.4. Landmark Methods

In 1997, Keogh and Smyth introduce a probabilistic method for sequence

retrieval, where the features extracted are characteristic parts of the

sequence, so-called feature shapes . Keogh (1997) uses a similar landmark

based technique. Both these methods also use the dimensionality reduction

technique of piecewise linear approximation (see Section 4.2) as a prepro-

cessing step. The methods are based on finding similar landmark features

(or shapes) in the target sequences, ignoring shifting and scaling within

given limits. The technique is shown to be significantly faster than sequen-

tial scanning (about an order of magnitude), which may be accounted for by

the compression of the piecewise linear approximation. One of the contribu-

tions of the method is that it is one of the first that allows some longitudinal

scaling.

A more recent paper by Perng et al . (2000) introduces a more general

landmark model. In its most general form, the model allows any point of

Search WWH ::

Custom Search

Home