Database Reference
In-Depth Information
bedded devices in the network are smart enough to be able to compute
functions of modest complexity. Such an approach results in a reduction
of the transmission costs, both because of the smaller distances of trans-
mission in a clustered environment, and also because only the aggregated
data is transmitted (which has much lower volume than the raw data).
It is possible to design different kinds of cluster hierarchies in order to
optimize the transmission costs in the underlying network. A detailed
discussion of the different aspects of in-network query processing may
be found in [63, 78].
3. Stream Mining Algorithms
In this section, we will discuss the key stream mining problems and
will discuss the challenges associated with each problem. We will also
provide a broad overview of the different directions of research for these
problems.
3.1 Data Stream Clustering
Clustering is a widely studied problem in the data mining literature.
However, it is more dicult to adapt arbitrary clustering algorithms to
data streams because of one-pass constraints on the data set. An inter-
esting adaptation of the k -means algorithm has been discussed in [43]
which uses a partitioning based approach on the entire data set. This
approach uses an adaptation of a k -means technique in order to create
clusters over the entire data stream. However, in practical applications,
it is often desirable to be able to examine clusters over user-specified
time-horizons. For example, an analyst may desire to examine the be-
havior of the clusters in the data stream over the past one week, the
past one month, or the past year. In such cases, it is desirable to store
intermediate cluster statistics , so that it is possible to leverage these in
order to examine the behavior of the underlying data.
One such technique is micro-clustering [10], in which we use cluster
feature vectors [81] in order to perform stream clustering. The cluster
feature vectors keep track of the first-order and second-order moments
of the underlying data in order to perform the clustering. These features
satisfy the following critical properties which are relevant to the stream
clustering process:
Additivity Property: The statistics such as the first- or second-
order moments can be maintained as a simple addition of statistics
over data points. This is critical in being able to maintain the
statistics eciently over a fast data stream. Furthermore, addi-
tivity also implies subtractivity; thus, it is possible to obtain the
Search WWH ::




Custom Search