Frequent Pattern Mining in Data Streams - Frequent Pattern Mining

Database Reference

In-Depth Information

Table 9.4 Non-itemset pattern mining algorithms for streaming data

Publication

Window a

Pattern

Batch?

Accuracy

Algorithm

Teng'03 [ 72 ]

Seq across

window

False

−

FTP-DS

Chang'05 [ 15 ]

Subseq

False +

eISeq

Raïssi'05 [ 69 ]

Subseq

False +

SPEED

Marascu'06 [ 62 ]

Subseq

False

SMDS

Ezeife'07 [ 27 ]

Subseq

False

−

SSM

Chang'08 [ 16 ]

Closed subseq

Exact

SeqStream

Mendes'08 [ 64 ]

Subseq

False +

SS-BE, SS-MB

Koper'11 [ 43 ]

Subseq

False +

SS-BE2, SS-

LC, SS-LC2

Asai'02 [ 6 ]

Subtree

False

−

StreamT

Li'06 [ 54 ]

Subtree

False

−

FQT-Stream

Bifet'08 [ 10 ]

Closed subtree

False

−

AdaTreeNat

Bifet'10 [ 11 ]

Subgraph

False −

AdaGraphMiner

Aggarwal'10 [ 5 ] Dense subgraph L b False ±

a L landmark, S sliding window, D damped, T tilted time window, A all types

4.1

Subsequences

In an early work, Teng et al. [ 72 ] defined a data stream in which each transaction is

an itemset at a certain time and belonging to a particular customer. The algorithm

can combine transactions belonging to the same customer to generate a sequence

over time. Then, the problem is to find item sequences which occur frequently. Later

authors, such as Raïssi et al. [ 69 ], assume that this sort of market basket data is

preprocessed so that the algorithm is presented with a data stream in which each

transaction is already an ordered sequence of items.

It is no longer necessary to merge compatible transactions to form sequences.

Raïssi et al. presented the first algorithm [ 69 ] for this data model. Their algorithm,

SPEED, processes transactions in batches and uses the tilted time window model to

compress older data. It maintains an item table, a sequence table, and a region tree.

In the region tree, each vertex is a transaction, and the parent-child relationship is a

supersequence-subsequence relationship.

SMDS [ 62 ] also uses the tilted time window mode. Its unique feature is that it

incrementally computes sequence (transaction) clusters. The centroid of each cluster

is the basis for frequent sequences: if a cluster has at least θ members, then the items

of the centroid sequence which occur at least θ times define a frequent sequence. A

prefix tree is then used to record these frequent sequences.

The damped window model is used in [ 15 ], which extends the authors' work on

frequent itemsets [ 14 ]. If a pattern is not currently being monitored, its support is

estimated from the support levels of its subpatterns which are being monitored.

The SSM algorithm [ 27 ] uses the landmark window model. There are three key

data strutures. For each batch, SSM records all the input sequences and counts the

frequency of each individual item in a hash table (D-List). Items with frequency less

than are filtered out of the stored input sequences, leaving subsequences made from

Frequent Pattern Mining

Search WWH ::

Custom Search

Home