MINING SENSOR DATA STREAMS - Managing and Mining Sensor Data

Database Reference

In-Depth Information

statistics over a particular time horizon, by subtracting out the

statistics at the beginning of the horizon from the statistics at the

end of the horizon.

Computational Convenience: The first and second order statis-

tics can be used to compute a vast array of cluster parameters such

as the cluster centroid and radius. This is useful in order to be

able to compute important cluster characteristics in real time.

It has been shown in [10], that the micro-cluster technique is much more

effective and versatile than the k -means based stream technique dis-

cussed in [43]. This broad technique has also been extended to a variety

of other kinds of data. Some examples of such data are as follows:

High Dimensional Data: The stream clustering method can

also be extended to the concept of projected clustering [5]. A tech-

nique for high dimensional projected clustering of data streams is

discussed in [11]. In this case, the same micro-cluster statistics

are used for maintaining the characteristics of the clusters, except

that we also maintain additional information which keeps track of

the projected dimensions in each cluster. The projected dimen-

sions can be used in conjunction with the cluster statistics to com-

pute the projected distances which are required for intermediate

computations. Another innovation proposed in [11] is the use of

decay-based approach for clustering. The idea in the decay-based

approach is relevant for the case of evolving data stream model,

and is applicable not just to the high dimensional case, but any of

the above variants of the micro-cluster model. In this approach,

the weight of a data point is defined as 2 −λ·t ,where t is the current

time-instant. Thus, each data point has a half-life of 1 /λ ,whichis

the time in which the weight of the data point reduces by a factor

of 2. We note that the decay-based approach poses a challenge

because the micro-cluster statistics are affected at each clock tick,

even if no points arrive from the data stream. In order to deal with

this problem, a lazy approach is applied to decay-based updates, in

which we update the decay-behavior for a micro-cluster only if a

data point is added to it. The idea is that as long as we keep track

of the last time t s at which the micro-cluster was updated, we only

need to multiply the micro-cluster statistics by 2 −λ ( t c −t s ) ,where t c

is the current time instant. After multiply the decay statistics by

this factor, it is possible to add the micro-cluster statistics of the

current data point. This approach can be used since the statistics

of each micro-cluster decay by the same factor in each track, and it

is therefore possible to implicitly keep track of the decayed values,

Search WWH ::

Custom Search

Home