DIMENSIONALITY REDUCTION AND FILTERING ON TIME SERIES SENSOR STREAMS - Managing and Mining Sensor Data

Database Reference

In-Depth Information

2.1 Dimensionality reduction

Much of the work on stream mining has focused on finding interesting

patterns in a single stream, but multiple streams have also attracted

significant interest. Ganti et al. [22] propose a generic framework for

stream mining. Guha et al. [25] propose a one-pass k -median clustering

algorithm. [15] construct a decision tree online, by passing over the

data only once. Later on, [29] and [54] addressed the problem of finding

patterns over concept drifting streams.

The work in [35] propose parameter-free methods for classic data min-

ing tasks (i.e., clustering, anomaly detection, classification), based on

compression. The work in [36] proposes a multi-resolution clustering

scheme for time series data. It uses the average coe cients (low fre-

quencies) of the wavelet transform to perform k -means clustering and

progressively refines the clusters by incorporating higher-level, detail

coecients. This approach requires much less time for convergence,

compared to operating directly on the very high dimension of the orig-

inal series. Both approaches require the complete data in advance. [4]

propose a framework for Phenomena Detection and Tracking (PDT) in

sensor networks. They define a phenomenon on discrete-valued streams

and develop query execution techniques based on multi-way hash join

with PDT-specific optimizations.

CluStream [1] is a flexible clustering framework with online and of-

fline components. The online component extends micro-cluster infor-

mation [61] by incorporating exponentially-sized sliding windows while

coalescing micro-cluster summaries. Actual clusters are found by the

oine component. StatStream [62] uses the DFT to summarise streams

within a finite window and then compute the highest pairwise corre-

lations among all pairs of streams, at each timestamp. BRAID [49]

addresses the problem of discovering lag correlations among multiple

streams. The focus is on time and space ecient methods for finding

the earliest and highest peak in the cross-correlation functions between

all pairs of streams. Similar to [42] (see below), BRAID employs a rep-

resentation with fidelity that decreases with age. The work in [39] has

studied how to e ciently compute pairwise correlations among large col-

lections of time series, by combining compressed Fourier representations

with graph partitioning techniques. Neither CluStream, StatStream, or

BRAID explicitly focus on discovering hidden variables.

MUSCLES [58] is exactly designed to do forecasting (thus it could

handle missing values). However, it can not find hidden variables and

it scales poorly for a large number of streams n , since it requires at

Search WWH ::

Custom Search

Home