Information Technology Reference
In-Depth Information
One obvious solution is thus to reduce the time series dimensionality, adopting
a transform that preserves the distance between two time series (or underesti-
mates it: in this case a post-processing step will be required, to filter out the
so-called “false alarms”; the requirement is never to overestimate the distance, so
that no “false dismissals” can exist [17]). Widely used transforms are the Discrete
Fourier Transform (DFT) [2], and the Discrete Wavelet Transform (DWT) [10].
DFT maps time series to the frequency domain. DFT application for dimen-
sionality reduction stems from the observation that, for the majority of real-world
time series, the first (1-3) Fourier coecients carry the most meaningful infor-
mation, and the remaining ones can be safely discarded. Moreover, Parseval's
theorem [38] guarantees that the distance in the frequency domain is the same
as in the time domain, when resorting to any similarity measure that can be ex-
pressed as the Euclidean distance between feature vectors in the feature space.
In particular, resorting only to the first Fourier coecients can underestimate
the real distance, but never overestimates it. The definition of similarity can also
be extended with invariance under a group of transformations, like amplitude
scaling and shift (see [15,41]).
In DFT, comparison is based on low frequency components, where most en-
ergy is presumed to be concentrated on. DFT is good for detecting irregularities
in signals, but it looses the temporal aspects, i.e., when in time the irregularities
occur. An interesting way of detecting these temporal attributes is to transform
the time series into time-frequencies, i.e. to retain both temporal and frequency
aspects of the signals. This can be reached exploiting DWT, since wavelets,
which are basis functions used to represent other functions, are local both in
time and in frequency. DWT can be repeatedly applied to the data: each ap-
plication brings out a higher resolution of the data, while at the same time it
smoothes the remaining data. The output of the DWT consists of the remain-
ing smooth components and of all the accumulated detail components. DWT,
like any orthonormal transform, preserves the Euclidean distance as the DFT
does. Moreover, Chan & Fu [10] have obtained through experiments that the
application of the Haar wavelet outperforms DFT in similarity-based time series
retrieval. The number of wavelet coecients to be kept, although lower than
the original data dimensionality, is higher than in the case of DFT application.
Since R-trees and variants perform well with 5 dimensions at most [7], other in-
dex structures, such as the X-tree [7], and the TV-tree [50], have to be adopted
if resorting to DWT.
A different approach to dimensionality reduction is Piecewise Constant Ap-
proximation (PCA) (see e.g. [22,23]): it consists in dividing a time series into
k segments, and in using their average values as a k -dimensional feature vector
(where obviously k<<n , the original data dimensionality). The best value of k
can also be estimated. PCA is robust to various transformations, such as scaling
and offset translation, and its calculation is claimed to be fast.
The choice of the most cost-effective transformation to apply should be done
on the basis of the application at hand. Examples in the medical domain will be
illustrated in the following subsection.
Search WWH ::




Custom Search