Database Reference
In-Depth Information
least quadratic space and time, or expensive reorganization ( selective
MUSCLES ).
The problem of principal components analysis (PCA) and SVD on
streams has been addressed in [44] and [24]. Both of these approaches
focus on discovering linear correlations among multiple streams and on
applying these correlations for further data processing and anomaly de-
tection [44]. [24] first does dimensionality reduction with random pro-
jections, and then periodically computes the SVD. However, the method
incurs some overhead because of the SVD re-computation and it can not
easily handle missing values. Also related is the work of [13] which uses
a different formulation of linear correlations and focuses on compressing
historical data, mainly for power conservation in sensor networks. Fi-
nally, the work in [6] proposes an approach to combine segmentation of
multidimensional series with dimensionality reduction. The reduction
is on the segment representatives and it is performed across dimensions
(similar to [44]), not along time, and the approach is not applicable to
streams.
Beyond discovering and leveraging possibly evolving patterns in stream-
ing series in an unsupervised fashion, the work in [55] leverages com-
monalities in a set of given query patterns, in order to discover them
eciently among streaming data. The work in [53] and [50] studies
“anytime” algorithms for nearest-neighbor classification on streams of
either single items or batches of items. In such a setting, available re-
sources (time or buffer space) can be traded-off for increased accuracy.
The work in [5] develops anytime algorithms for outlier detection on data
streams, based on a hierarchical cluster representation as a reduced rep-
resentation of the incoming data.
Closely related to [48] (see below) is [21], which develops a joint com-
pression framework for collections of time series, while providing guar-
antees on maximum reconstruction error, as well as also allowing queries
to be answered using indices directly on the compressed representation.
Sensor streams. A number of related techniques for correlation and
prediction across multiple sensor streams are covered in [2] [57] [11] [51].
Such methods can be used in order to improve the power eciency of a
sensor network, because only the non-redundant sensors need to transmit
their data at higher sampling rates.
2.2 Compression and filtering
Initial work on time series representation [3, 19] uses the Fourier trans-
form. Even more recent work uses fixed, predetermined bases or approx-
Search WWH ::




Custom Search