Database Reference
In-Depth Information
Table 9.1 Example of
itemset stream
Itemset ID
Contents
1
A,B,D,E
2
B,C
3
A,B,C,D,E
4
B,C,D,E
5
A,C,E
Table 9.2 Common varieties
of patterns from a data steam
Data object Pattern
Item or Itemset Item
Itemset
Subitemset
Sequence
Subsequence
Itemset
Sequence of items spanning
a sequence of itemsets
Tree
Subtree
Graph
Subgraph or subtree
( B , E ), ( D , E ), ( A , B , D ), ( A , B , E ), ( A , D , E ), ( B , D , E ), ( A , B , D , E )
}
, excluding
singleton items. If we set θ
0 . 6, then an itemset must occur in 3 of the 5
objects to be considered frequent. The frequent itemsets are Patt θ = 0 . 6 ( T 1,5 )
=
=
( A , E ), ( B , C ), ( B , D ), ( B , E ), ( D , E ), and ( B , D , E ).
Table 9.2 lists the types of data streams and patterns that are notable in the
literature.
Different types of data suggest different types of patterns. For example, a natural
language text document can be considered either a bag of words, for itemset mining,
or a sequence of words, for sequence mining. There are two major ways of formulat-
ing the subsequence problem. Each data object can itself be a sequence. Alternately,
the data stream itself forms an unending sequence. An XML document, when read
from top to bottom, is a depth-first traversal of a tree, so it may be suitable for subtree
mining.
Arguably the most important frequent pattern mining task is frequent itemset
mining , proposed by Rakesh Agrawal and Srikant in 1993 [ 2 ]. In this setting, each
object in the dataset
T
is a set of items. Let
X
be the set of all possible items in the
= x i 1 ,
, x i | T i | , where
dataset
T
. Then data object T i can be represented as T i
···
X
P
X
x ij
. Note that in this setting, the set
of all possible transactional objects T is the same as
. The pattern space
is the power-set of
P
.
Because the majority of work in mining frequent patterns over data streams focuses
on frequent itemset mining, we devote the major portion of our chapter to itemset
patterns. Many techniques developed in the itemset context can be transferred easily
to mining other types of patterns, such as graph mining [ 38 ].
2.2
Data Windows
In the data stream setting, the sequence of data objects,
),
arrives over time with no known ending time. After some initial delay from the
T
=( T 1 , T 2 ,
···
, T i ,
···
 
Search WWH ::




Custom Search