MINING SENSOR DATA STREAMS - Managing and Mining Sensor Data

Database Reference

In-Depth Information

bedded devices in the network are smart enough to be able to compute

functions of modest complexity. Such an approach results in a reduction

of the transmission costs, both because of the smaller distances of trans-

mission in a clustered environment, and also because only the aggregated

data is transmitted (which has much lower volume than the raw data).

It is possible to design different kinds of cluster hierarchies in order to

optimize the transmission costs in the underlying network. A detailed

discussion of the different aspects of in-network query processing may

be found in [63, 78].

3. Stream Mining Algorithms

In this section, we will discuss the key stream mining problems and

will discuss the challenges associated with each problem. We will also

provide a broad overview of the different directions of research for these

problems.

3.1 Data Stream Clustering

Clustering is a widely studied problem in the data mining literature.

However, it is more dicult to adapt arbitrary clustering algorithms to

data streams because of one-pass constraints on the data set. An inter-

esting adaptation of the k -means algorithm has been discussed in [43]

which uses a partitioning based approach on the entire data set. This

approach uses an adaptation of a k -means technique in order to create

clusters over the entire data stream. However, in practical applications,

it is often desirable to be able to examine clusters over user-specified

time-horizons. For example, an analyst may desire to examine the be-

havior of the clusters in the data stream over the past one week, the

past one month, or the past year. In such cases, it is desirable to store

intermediate cluster statistics , so that it is possible to leverage these in

order to examine the behavior of the underlying data.

One such technique is micro-clustering [10], in which we use cluster

feature vectors [81] in order to perform stream clustering. The cluster

feature vectors keep track of the first-order and second-order moments

of the underlying data in order to perform the clustering. These features

satisfy the following critical properties which are relevant to the stream

clustering process:

Additivity Property: The statistics such as the first- or second-

order moments can be maintained as a simple addition of statistics

over data points. This is critical in being able to maintain the

statistics eciently over a fast data stream. Furthermore, addi-

tivity also implies subtractivity; thus, it is possible to obtain the

Search WWH ::

Custom Search

Home