Database Reference
In-Depth Information
Table 3.1
The transaction
database
TDB
tid
Itemset
(Ordered) frequent items
100
f
,
a
,
c
,
d
,
g
,
i
,
m
,
p
f
,
c
,
a
,
m
,
p
200
a
,
b
,
c
,
f
,
l
,
m
,
o
f
,
c
,
a
,
b
,
m
300
b
,
f
,
h
,
j
,
o
f
,
b
400
b
,
c
,
k
,
s
,
p
c
,
b
,
p
500
a
,
f
,
c
,
e
,
l
,
p
,
m
,
n
f
,
c
,
a
,
m
,
p
In this chapter, we provide an overview of several recently developed frequent
pattern growth mining methods and discuss their implications. The remaining of the
paper is organized as follows. In Sect.
2
, we examine the FP-growth method for
mining frequent itemsets and also mention the CLOSET method for mining frequent
closed itemsets. In Sect.
3
, we look at the impact of FP-growth to constraint-based
mining of frequent patterns and the handling of convertible constraints. In Sect.
4
,
we introduce two pattern-growth-based methods for mining sequential patterns:
FreeSpan [
17
] and PrefixSpan [
30
]. In Sect.
6
, we discuss the potential extensions
of pattern-growth methods and conclude our study.
2
FP-Growth: Pattern Growth for Mining Frequent Itemsets
As shown by many researchers [
1
,
3
], mining frequent itemsets represents the core
of mining association rules, correlations, and many other patterns.
Let a
transaction database TDB
consist of a set of transactions in the form of
T
(
tid
,
X
) where
tid
is a
transaction-id
and
X
an itemset (i.e., a set of items).
A transaction
T
is said to
contain
itemset
Y
if and only if
Y
=
X
. The support of
an itemset
W
in
TDB
, denoted as
sup
(
W
), is the number of transactions in
TDB
containing
W
. Given a user-specified minimum support threshold,
min
_
sup
,
W
is frequent if and only if
sup
(
W
)
⊆
min
_
sup
. The problem of mining frequent
itemsets is to
find the complete set of frequent itemsets in a transaction database
TDB w.r.t. given support threshold min_sup
.
Here we examine how one can develop a pattern growth method, FP-growth [
18
],
for efficient mining of frequent itemsets in large databases. FP-growth first performs a
frequent item-based database projection when the database is large and then switches
to main-memory-based mining by constructing a compact data structure, called FP-
tree, and transforming mining database into mining this compact tree. We first show
how FP-tree be constructed from a database.
≥
Example 1
(FP-tree)Let the transaction database,
DB
, be (the first two columns of)
Table
3.1
and the minimum support threshold be 3.
First, a scan of
DB
derives a
list
of frequent items,
(
f
: 4), (
c
: 4), (
a
: 3), (
b
:
3), (
m
: 3), (
p
:3)
, (the number after “:” indicates the support), and with items
ordered in frequency descending order. This ordering is important since each path of
a tree will follow this order. For convenience of later discussions, the frequent items
in each transaction are listed in this ordering in the rightmost column of Table
3.1
.