Database Reference
In-Depth Information
Table 9.4 Non-itemset pattern mining algorithms for streaming data
Publication
Window a
Pattern
Batch?
Accuracy
Algorithm
Teng'03 [ 72 ]
Seq across
window
S
False
FTP-DS
Chang'05 [ 15 ]
Subseq
D
False +
eISeq
Raïssi'05 [ 69 ]
Subseq
T
b
False +
SPEED
Marascu'06 [ 62 ]
Subseq
T
b
False
±
SMDS
Ezeife'07 [ 27 ]
Subseq
L
b
False
SSM
Chang'08 [ 16 ]
Closed subseq
S
Exact
SeqStream
Mendes'08 [ 64 ]
Subseq
L
b
False +
SS-BE, SS-MB
Koper'11 [ 43 ]
Subseq
L
b
False +
SS-BE2, SS-
LC, SS-LC2
Asai'02 [ 6 ]
Subtree
D
False
StreamT
Li'06 [ 54 ]
Subtree
L
False
FQT-Stream
Bifet'08 [ 10 ]
Closed subtree
D
b
False
AdaTreeNat
Bifet'10 [ 11 ]
Subgraph
D
b
False
AdaGraphMiner
Aggarwal'10 [ 5 ] Dense subgraph L b False ±
a L landmark, S sliding window, D damped, T tilted time window, A all types
4.1
Subsequences
In an early work, Teng et al. [ 72 ] defined a data stream in which each transaction is
an itemset at a certain time and belonging to a particular customer. The algorithm
can combine transactions belonging to the same customer to generate a sequence
over time. Then, the problem is to find item sequences which occur frequently. Later
authors, such as Raïssi et al. [ 69 ], assume that this sort of market basket data is
preprocessed so that the algorithm is presented with a data stream in which each
transaction is already an ordered sequence of items.
It is no longer necessary to merge compatible transactions to form sequences.
Raïssi et al. presented the first algorithm [ 69 ] for this data model. Their algorithm,
SPEED, processes transactions in batches and uses the tilted time window model to
compress older data. It maintains an item table, a sequence table, and a region tree.
In the region tree, each vertex is a transaction, and the parent-child relationship is a
supersequence-subsequence relationship.
SMDS [ 62 ] also uses the tilted time window mode. Its unique feature is that it
incrementally computes sequence (transaction) clusters. The centroid of each cluster
is the basis for frequent sequences: if a cluster has at least θ members, then the items
of the centroid sequence which occur at least θ times define a frequent sequence. A
prefix tree is then used to record these frequent sequences.
The damped window model is used in [ 15 ], which extends the authors' work on
frequent itemsets [ 14 ]. If a pattern is not currently being monitored, its support is
estimated from the support levels of its subpatterns which are being monitored.
The SSM algorithm [ 27 ] uses the landmark window model. There are three key
data strutures. For each batch, SSM records all the input sequences and counts the
frequency of each individual item in a hash table (D-List). Items with frequency less
than are filtered out of the stored input sequences, leaving subsequences made from
 
Search WWH ::




Custom Search