Database Reference
In-Depth Information
starting point, a mining result is generated based on the window of date received
so far. As the sequence continues, the window is updated and so are the results.
However, it is not necessarily true that we want to give equal consideration to all
the data received from the start up to the present. Consequently, several standard
data window models exist. A window is a subsequence between the i -th and j -th
transactions, denoted as T i , j
j . A user can ask different
types of frequent pattern-mining questions over different type of window model.
=
( T i , T i + 1 ,
···
, T j ), i
Landmark Window In this model, we seek the frequent patterns contained in the
window from a fixed starting timepoint s to the current time t . In other words, we
are trying to find the frequent patterns over the window T s , t . A special case of the
landmark window is when s
1. In this case, we are interested in the frequent
patterns over the entire data stream. Clearly, the difficulty to solve the special case
of s
=
1 is essentially the same as the more general cases, and all cases require
an efficient single-pass mining algorithm. For simplicity, we will focus on the case
where the full data stream is the window.
Note that in this model, we treat each timepoint after the starting point as equally
important. However, in many cases, we are more interested in the recent timepoints.
The following models address this issue.
=
Sliding Window Given a window width w and current timepoint t , we are interested
in the frequent patterns occurring in the window [ t
1, t ]. As time advances,
the window will keep its width and move along with the current timepoint. In this
model, we are not interested in the data which arrived before the timepoint t
w
+
w
+
1.
Damped Window Model This model assigns greater weight to more recently ar-
rived transactions. A simple way to do that is to define a decay rate δ ,0
1[ 14 ].
As each new data transaction arrives, the support levels of the previously recorded
patterns are multiplied by δ to reduce their significance. Thus, a pattern that occurred
k time steps ago has a weight of δ k . The total support for a pattern is the sum of its
time-decayed counts.
Time-Tilted Window The time-tilted window was introduced by Giannella et al.
[ 28 ]. In this model, we are interested in frequent itemsets over a set of windows of
varying width. Each window corresponds to different time granularity based on their
recency. In the log-time version, each window is twice as wide as it more recent
neighbor. Specifically, the two most recent windows are 1 time unit wide. The one
before that is 2 units wide, and the one before that is 4 units wide. Such model can
allow us to pose more complicated queries over the data stream. Giannella et. al. have
developed a variant of FP-tree, called FP-stream, for dynamically updating frequent
patterns on streaming data and answering the approximate frequent itemsets for even
arbitrary time intervals [ 28 ].
2.3
Frequent Item Mining
Before looking at the more challenging case of frequent itemset mining, we consider
some algorithms for frequent item mining. Algorithms for this problem fall into two
Search WWH ::




Custom Search