Database Reference
In-Depth Information
stream evolution can also affect the behavior of the underlying data min-
ing algorithms since the results can become stale over time. The broad
algorithms for change diagnosis in data streams are as follows:
Velocity Density Estimation: In velocity density estimation [8], we
compute the rate of change of data density of different points in the
data stream over time. Depending upon the direction of density rate
of change, one may identify regions of dissolution , coagulation and shift .
Spatial profiles can also be constructed in order to determine the direc-
tions of shift in the underlying data. In addition, it is possible to use
the velocity density concept in order to identify those combinations of
dimensions which have a high level of evolution. Another technique for
change quantification is discussed in [37], which uses methods for prob-
ability difference quantification in order to identify the changes in the
underlying data. In [59], a method is discussed in order to determine
statistical changes in the underlying data. Clustering [10] can be used in
order to determine significant evolution in the underlying data. In [10],
micro-clustering is used in order to determine significant clusters which
have evolved in the underlying data.
A separate line of work is the determination of significant changes in
the results of data mining algorithms because of evolution. For example
in [10], it has been shown how to determine significant evolving clusters
in the underlying data. In [13], a similar technique has been used to
keep a refreshed classification model in the presence of evolving data.
In this respect, micro-clustering provides an effective technique, since it
provides a way to store intermediate statistics of the underlying data
in the form of clusters. In [13], a micro-cluster based nearest neighbor
classifier is used in order to classify evolving data streams. The key
idea is to construct class-specific micro-clusters over a variety of time
horizons, and then utilize the time horizon with the greatest accuracy in
order to perform the classification process. The issue of stream evolution
has been extended to many other problems such as synopsis construc-
tion and reservoir sampling [6]. We will discuss some of the synopsis
construction methods later.
3.5 Synopsis Construction in Data Streams
The large volume of data streams poses unique space and time con-
straints on the computation process. Many query processing, database
operations, and mining algorithms require ecient execution which can
be dicult to achieve with a fast data stream. Furthermore, since it is
impossible to fit the entire data stream within the available space, the
space eciency of the approach is a major concern. In many cases, it
Search WWH ::




Custom Search