Database Reference
In-Depth Information
2. Previous Work
We review related work on the comparison and indexing of time series.
Feature sets. Researchers have considered various feature sets for com-
pressing time series and measuring similarity between them. They have
extensively studied Fourier transforms, which allow fast compression [Singh
and McAtackney (1998), Sheikholeslami et al. (1998), Stoffer (1999), Yi
et al. (2000)], however, this technique has several disadvantages. In par-
ticular, it smoothes local extrema, which may lead to a loss of important
information, and it does not work well for erratic series [Ikeda et al. (1999)].
Chan and his colleagues applied Haar wavelet transforms to time series and
showed several advantages of this technique over Fourier transforms [Chan
and Fu 1999, Chan et al. (2003)].
Guralnik and Srivastava (1999) considered the problem of detecting a
change in the trend of a data stream, and developed a technique for finding
“change points” in a series. Last et al. (2001) proposed a general frame-
work for knowledge discovery in time series, which included representa-
tion of a series by its key features, such as slope and signal-to-noise ratio.
They described a technique for computing these features and identifying
the points of change in the feature values.
Researchers have also studied the use of small alphabets for compression
of time series, and applied string matching to the pattern search [Agrawal
et al. (1995), Huang and Yu (1999), Andr-Jnsson and Badal (1997), Lam
and Wong (1998), Park et al. (1999), Qu et al. (1998)]. For example,
Guralnik et al . (1997) compressed stock prices using a nine-letter alphabet.
Singh and McAtackney (1998) represented stock prices, particle dynam-
ics, and stellar light intensity using a three-letter alphabet. Lin and Risch
(1998) used a two-letter alphabet to encode major spikes in a series. Das et
al . (1998) utilized an alphabet of primitive shapes for ecient compression.
These techniques give a high compression rate, but their descriptive power
is limited, which makes them inapplicable in many domains.
Perng et al. (2000) investigated a compression technique based on
extraction of “landmark points,” which included local minima and max-
ima. Keogh and Pazzani (1997; 1998) used the endpoints of best-fit line
segments to compress a series. Keogh et al. (2001) reviewed the compres-
sion techniques based on approximation of a time series by a sequence of
straight segments. We describe an alternative compression technique, based
on selection of important minima and maxima.
Search WWH ::




Custom Search