MINING SENSOR DATA STREAMS - Managing and Mining Sensor Data

Database Reference

In-Depth Information

[15] constructs a density-based framework to summarize the uncertain

data stream effectively and use it for classification purposes.

3.3 Frequent Pattern Mining

The problem of frequent pattern mining was first introduced in [16],

and was extensively analyzed for the conventional case of disk resident

data sets. In the case of data streams, one may wish to find the frequent

itemsets either over a sliding window or the entire data stream [44, 53].

In the case of data streams, the problem of frequent pattern mining can

be studied under several models:

Entire Data Stream Model: In this model, the frequent patterns

need to be mined over the entire data stream. Thus, the main difference

from a conventional pattern mining algorithm is that the frequent pat-

terns need to be mined in one pass over the entire data stream. Most

frequent pattern mining algorithms require multiple passes in order to es-

timate the frequency of patterns of different sizes in the data. A natural

method for frequent pattern counting is to use sketch-based algorithms

in order to determine frequent patterns. Sketches are often used in order

to determine heavy-hitters in data streams, and therefore, an extension

of the methodology to the problem of finding frequent patterns is natu-

ral. Along this line, Manku and Motwani [64] proposed the first one lass

algorithm called Lossy Counting , in order to find all frequent itemsets

over a data stream. The algorithm allows false positives, but not false

negatives. Thus, for a given support level s , the algorithm is guaranteed

not to contain all frequent itemsets whose support is greater than s

.

Another interesting approach in [80] determines all the frequent patterns

whose support is greater than s with probability at least 1

−

δ ,which

the value of δ is as small as desired, as long as one is willing to add space

and time complexity proportional to ln(1 /δ ).Thus, this model does not

allow false negatives, but may miss some of the frequent patterns. The

main advantage of such a technique is that it is possible to provide a

more concise set of frequent patterns at the expense of losing some of the

patterns with some probability which is quite low for practical purposes.

Sliding Window Model: In many cases, the data stream may evolve

over time, as a result of which it is desirable to determine all the frequent

patterns over a particular sliding window. A method for determining the

frequent patterns over a sliding window is discussed in [29]. The main

assumption of this approach is that the number of frequent patterns are

not very large, and therefore, it is possible to hold the transactions in

each sliding window in main memory. The main focus of this approach

is to determine closed frequent itemsets over the data stream. A new

−

Managing and Mining Sensor Data

Search WWH ::

Custom Search

Home