Database Reference
In-Depth Information
Table 3.1 The transaction
database TDB
tid
Itemset
(Ordered) frequent items
100
f , a , c , d , g , i , m , p
f , c , a , m , p
200
a , b , c , f , l , m , o
f , c , a , b , m
300
b , f , h , j , o
f , b
400
b , c , k , s , p
c , b , p
500
a , f , c , e , l , p , m , n
f , c , a , m , p
In this chapter, we provide an overview of several recently developed frequent
pattern growth mining methods and discuss their implications. The remaining of the
paper is organized as follows. In Sect. 2 , we examine the FP-growth method for
mining frequent itemsets and also mention the CLOSET method for mining frequent
closed itemsets. In Sect. 3 , we look at the impact of FP-growth to constraint-based
mining of frequent patterns and the handling of convertible constraints. In Sect. 4 ,
we introduce two pattern-growth-based methods for mining sequential patterns:
FreeSpan [ 17 ] and PrefixSpan [ 30 ]. In Sect. 6 , we discuss the potential extensions
of pattern-growth methods and conclude our study.
2
FP-Growth: Pattern Growth for Mining Frequent Itemsets
As shown by many researchers [ 1 , 3 ], mining frequent itemsets represents the core
of mining association rules, correlations, and many other patterns.
Let a transaction database TDB consist of a set of transactions in the form of
T
( tid , X ) where tid is a transaction-id and X an itemset (i.e., a set of items).
A transaction T is said to contain itemset Y if and only if Y
=
X . The support of
an itemset W in TDB , denoted as sup ( W ), is the number of transactions in TDB
containing W . Given a user-specified minimum support threshold, min _ sup , W
is frequent if and only if sup ( W )
min _ sup . The problem of mining frequent
itemsets is to find the complete set of frequent itemsets in a transaction database
TDB w.r.t. given support threshold min_sup .
Here we examine how one can develop a pattern growth method, FP-growth [ 18 ],
for efficient mining of frequent itemsets in large databases. FP-growth first performs a
frequent item-based database projection when the database is large and then switches
to main-memory-based mining by constructing a compact data structure, called FP-
tree, and transforming mining database into mining this compact tree. We first show
how FP-tree be constructed from a database.
Example 1 (FP-tree)Let the transaction database, DB , be (the first two columns of)
Table 3.1 and the minimum support threshold be 3.
First, a scan of DB derives a list of frequent items,
( f : 4), ( c : 4), ( a : 3), ( b :
3), ( m : 3), ( p :3)
, (the number after “:” indicates the support), and with items
ordered in frequency descending order. This ordering is important since each path of
a tree will follow this order. For convenience of later discussions, the frequent items
in each transaction are listed in this ordering in the rightmost column of Table 3.1 .
 
Search WWH ::




Custom Search