Database Reference
In-Depth Information
2.1 Dimensionality reduction
Much of the work on stream mining has focused on finding interesting
patterns in a single stream, but multiple streams have also attracted
significant interest. Ganti et al. [22] propose a generic framework for
stream mining. Guha et al. [25] propose a one-pass k -median clustering
algorithm. [15] construct a decision tree online, by passing over the
data only once. Later on, [29] and [54] addressed the problem of finding
patterns over concept drifting streams.
The work in [35] propose parameter-free methods for classic data min-
ing tasks (i.e., clustering, anomaly detection, classification), based on
compression. The work in [36] proposes a multi-resolution clustering
scheme for time series data. It uses the average coe cients (low fre-
quencies) of the wavelet transform to perform k -means clustering and
progressively refines the clusters by incorporating higher-level, detail
coecients. This approach requires much less time for convergence,
compared to operating directly on the very high dimension of the orig-
inal series. Both approaches require the complete data in advance. [4]
propose a framework for Phenomena Detection and Tracking (PDT) in
sensor networks. They define a phenomenon on discrete-valued streams
and develop query execution techniques based on multi-way hash join
with PDT-specific optimizations.
CluStream [1] is a flexible clustering framework with online and of-
fline components. The online component extends micro-cluster infor-
mation [61] by incorporating exponentially-sized sliding windows while
coalescing micro-cluster summaries. Actual clusters are found by the
oine component. StatStream [62] uses the DFT to summarise streams
within a finite window and then compute the highest pairwise corre-
lations among all pairs of streams, at each timestamp. BRAID [49]
addresses the problem of discovering lag correlations among multiple
streams. The focus is on time and space ecient methods for finding
the earliest and highest peak in the cross-correlation functions between
all pairs of streams. Similar to [42] (see below), BRAID employs a rep-
resentation with fidelity that decreases with age. The work in [39] has
studied how to e ciently compute pairwise correlations among large col-
lections of time series, by combining compressed Fourier representations
with graph partitioning techniques. Neither CluStream, StatStream, or
BRAID explicitly focus on discovering hidden variables.
MUSCLES [58] is exactly designed to do forecasting (thus it could
handle missing values). However, it can not find hidden variables and
it scales poorly for a large number of streams n , since it requires at
Search WWH ::




Custom Search