Database Reference
In-Depth Information
[15] constructs a density-based framework to summarize the uncertain
data stream effectively and use it for classification purposes.
3.3 Frequent Pattern Mining
The problem of frequent pattern mining was first introduced in [16],
and was extensively analyzed for the conventional case of disk resident
data sets. In the case of data streams, one may wish to find the frequent
itemsets either over a sliding window or the entire data stream [44, 53].
In the case of data streams, the problem of frequent pattern mining can
be studied under several models:
Entire Data Stream Model: In this model, the frequent patterns
need to be mined over the entire data stream. Thus, the main difference
from a conventional pattern mining algorithm is that the frequent pat-
terns need to be mined in one pass over the entire data stream. Most
frequent pattern mining algorithms require multiple passes in order to es-
timate the frequency of patterns of different sizes in the data. A natural
method for frequent pattern counting is to use sketch-based algorithms
in order to determine frequent patterns. Sketches are often used in order
to determine heavy-hitters in data streams, and therefore, an extension
of the methodology to the problem of finding frequent patterns is natu-
ral. Along this line, Manku and Motwani [64] proposed the first one lass
algorithm called Lossy Counting , in order to find all frequent itemsets
over a data stream. The algorithm allows false positives, but not false
negatives. Thus, for a given support level s , the algorithm is guaranteed
not to contain all frequent itemsets whose support is greater than s
.
Another interesting approach in [80] determines all the frequent patterns
whose support is greater than s with probability at least 1
δ ,which
the value of δ is as small as desired, as long as one is willing to add space
and time complexity proportional to ln(1 ).Thus, this model does not
allow false negatives, but may miss some of the frequent patterns. The
main advantage of such a technique is that it is possible to provide a
more concise set of frequent patterns at the expense of losing some of the
patterns with some probability which is quite low for practical purposes.
Sliding Window Model: In many cases, the data stream may evolve
over time, as a result of which it is desirable to determine all the frequent
patterns over a particular sliding window. A method for determining the
frequent patterns over a sliding window is discussed in [29]. The main
assumption of this approach is that the number of frequent patterns are
not very large, and therefore, it is possible to hold the transactions in
each sliding window in main memory. The main focus of this approach
is to determine closed frequent itemsets over the data stream. A new
Search WWH ::




Custom Search