INDEXING OF COMPRESSED TIME SERIES - Data Mining in Time Series Databases

Database Reference

In-Depth Information

2. Previous Work

We review related work on the comparison and indexing of time series.

Feature sets. Researchers have considered various feature sets for com-

pressing time series and measuring similarity between them. They have

extensively studied Fourier transforms, which allow fast compression [Singh

and McAtackney (1998), Sheikholeslami et al. (1998), Stoffer (1999), Yi

et al. (2000)], however, this technique has several disadvantages. In par-

ticular, it smoothes local extrema, which may lead to a loss of important

information, and it does not work well for erratic series [Ikeda et al. (1999)].

Chan and his colleagues applied Haar wavelet transforms to time series and

showed several advantages of this technique over Fourier transforms [Chan

and Fu 1999, Chan et al. (2003)].

Guralnik and Srivastava (1999) considered the problem of detecting a

change in the trend of a data stream, and developed a technique for finding

“change points” in a series. Last et al. (2001) proposed a general frame-

work for knowledge discovery in time series, which included representa-

tion of a series by its key features, such as slope and signal-to-noise ratio.

They described a technique for computing these features and identifying

the points of change in the feature values.

Researchers have also studied the use of small alphabets for compression

of time series, and applied string matching to the pattern search [Agrawal

et al. (1995), Huang and Yu (1999), Andr-Jnsson and Badal (1997), Lam

and Wong (1998), Park et al. (1999), Qu et al. (1998)]. For example,

Guralnik et al . (1997) compressed stock prices using a nine-letter alphabet.

Singh and McAtackney (1998) represented stock prices, particle dynam-

ics, and stellar light intensity using a three-letter alphabet. Lin and Risch

(1998) used a two-letter alphabet to encode major spikes in a series. Das et

al . (1998) utilized an alphabet of primitive shapes for ecient compression.

These techniques give a high compression rate, but their descriptive power

is limited, which makes them inapplicable in many domains.

Perng et al. (2000) investigated a compression technique based on

extraction of “landmark points,” which included local minima and max-

ima. Keogh and Pazzani (1997; 1998) used the endpoints of best-fit line

segments to compress a series. Keogh et al. (2001) reviewed the compres-

sion techniques based on approximation of a time series by a sequence of

straight segments. We describe an alternative compression technique, based

on selection of important minima and maxima.

Search WWH ::

Custom Search

Home