Frequent Pattern Mining in Data Streams - Frequent Pattern Mining

Database Reference

In-Depth Information

starting point, a mining result is generated based on the window of date received

so far. As the sequence continues, the window is updated and so are the results.

However, it is not necessarily true that we want to give equal consideration to all

the data received from the start up to the present. Consequently, several standard

data window models exist. A window is a subsequence between the i -th and j -th

transactions, denoted as T i , j

j . A user can ask different

types of frequent pattern-mining questions over different type of window model.

=

( T i , T i + 1 ,

···

, T j ), i

≤

Landmark Window In this model, we seek the frequent patterns contained in the

window from a fixed starting timepoint s to the current time t . In other words, we

are trying to find the frequent patterns over the window T s , t . A special case of the

landmark window is when s

1. In this case, we are interested in the frequent

patterns over the entire data stream. Clearly, the difficulty to solve the special case

of s

=

1 is essentially the same as the more general cases, and all cases require

an efficient single-pass mining algorithm. For simplicity, we will focus on the case

where the full data stream is the window.

Note that in this model, we treat each timepoint after the starting point as equally

important. However, in many cases, we are more interested in the recent timepoints.

The following models address this issue.

=

Sliding Window Given a window width w and current timepoint t , we are interested

in the frequent patterns occurring in the window [ t

1, t ]. As time advances,

the window will keep its width and move along with the current timepoint. In this

model, we are not interested in the data which arrived before the timepoint t −

−

w

+

w

+

1.

Damped Window Model This model assigns greater weight to more recently ar-

rived transactions. A simple way to do that is to define a decay rate δ ,0 <δ ≤

1[ 14 ].

As each new data transaction arrives, the support levels of the previously recorded

patterns are multiplied by δ to reduce their significance. Thus, a pattern that occurred

k time steps ago has a weight of δ k . The total support for a pattern is the sum of its

time-decayed counts.

Time-Tilted Window The time-tilted window was introduced by Giannella et al.

[ 28 ]. In this model, we are interested in frequent itemsets over a set of windows of

varying width. Each window corresponds to different time granularity based on their

recency. In the log-time version, each window is twice as wide as it more recent

neighbor. Specifically, the two most recent windows are 1 time unit wide. The one

before that is 2 units wide, and the one before that is 4 units wide. Such model can

allow us to pose more complicated queries over the data stream. Giannella et. al. have

developed a variant of FP-tree, called FP-stream, for dynamically updating frequent

patterns on streaming data and answering the approximate frequent itemsets for even

arbitrary time intervals [ 28 ].

2.3

Frequent Item Mining

Before looking at the more challenging case of frequent itemset mining, we consider

some algorithms for frequent item mining. Algorithms for this problem fall into two

Frequent Pattern Mining

Search WWH ::

Custom Search

Home