Databases Reference
In-Depth Information
Algorithm
:
FP growth.
Mine frequent itemsets using an FP-tree by pattern fragment growth.
Input
:
D
, a transaction database;
min sup
, the minimum support count threshold.
Output
: The complete set of frequent patterns.
Method
:
1.
The FP-tree is constructed in the following steps:
(a)
Scan the transaction database
D
once. Collect
F
, the set of frequent items, and their
support counts. Sort
F
in support count descending order as
L
, the
list
of frequent items.
(b)
Create the root of an FP-tree, and label it as “null.” For each transaction
Trans
in
D
do the
following.
Select and sort the frequent items in
Trans
according to the order of
L
. Let the sorted
frequent item list in
Trans
be [
p
j
P
], where
p
is the first element and
P
is the remaining
list. Call
insert
tree
.
, which is performed as follows. If
T
has a child
N
such that
N.item-name
D
p.item-name
, then increment
N
's count by 1; else create a new node
N
,
and let its count be 1, its parent link be linked to
T
, and its node-link to the nodes with
the same
item-name
via the node-link structure. If
P
is nonempty, call
inserttree
.
P
,
N
/
recursively.
2.
The FP-tree is mined by calling
FPgrowth
.
FP tree
,
null
/, which is implemented as follows.
procedureFPgrowth
(
Tree,
)
(1)
[
p
j
P
],
T
/
if
Tree
contains a single path
P
then
(2)
for each
combination (denoted as ) of the nodes in the path
P
(3)
generate pattern [ with
support count
=
minimum support count of nodes in
;
(4)
else for each
a
i
in the header of
Tree
f
(5)
generate pattern D
a
i
[ with
support count
D
a
i
.
support count
;
(6)
construct
's conditional pattern base and then
's conditional FP tree
Tree
;
(7)
if
Tree
6D;
then
call
FPgrowth
(
Tree
,
(8)
); g
Figure 6.9
FP-growth algorithm for discovering frequent itemsets without candidate generation.
(i.e., f
item
:
TID set
g/
, where
item
is an item name, and
TID set
is the set of transaction
identifiers containing the item. This is known as the
vertical data format
.
In this subsection, we look at how frequent itemsets can also be mined effi-
ciently using vertical data format, which is the essence of the
Eclat
(Equivalence Class
Transformation) algorithm.
Example 6.6
Mining frequent itemsets using the vertical data format.
Consider the horizontal
data format of the transaction database,
D
, of Table 6.1 in Example 6.3. This can be
transformed into the vertical data format shown in Table 6.3 by scanning the data
set once.
Mining can be performed on this data set by intersecting the TID sets of every pair
of frequent single items. The minimum support count is 2. Because every single item is