Databases Reference
In-Depth Information
Algorithm : FP growth. Mine frequent itemsets using an FP-tree by pattern fragment growth.
Input :
D , a transaction database;
min sup , the minimum support count threshold.
Output : The complete set of frequent patterns.
Method :
1. The FP-tree is constructed in the following steps:
(a) Scan the transaction database D once. Collect F , the set of frequent items, and their
support counts. Sort F in support count descending order as L , the list of frequent items.
(b) Create the root of an FP-tree, and label it as “null.” For each transaction Trans in D do the
following.
Select and sort the frequent items in Trans according to the order of L . Let the sorted
frequent item list in Trans be [ p j P ], where p is the first element and P is the remaining
list. Call insert tree .
, which is performed as follows. If T has a child N such that
N.item-name D p.item-name , then increment N 's count by 1; else create a new node N ,
and let its count be 1, its parent link be linked to T , and its node-link to the nodes with
the same item-name via the node-link structure. If P is nonempty, call inserttree . P , N /
recursively.
2. The FP-tree is mined by calling FPgrowth . FP tree , null /, which is implemented as follows.
procedureFPgrowth ( Tree, )
(1)
[ p j P ], T
/
if Tree contains a single path P then
(2)
for each combination (denoted as ) of the nodes in the path P
(3)
generate pattern [ with support count = minimum support count of nodes in ;
(4)
else for each a i in the header of Tree f
(5)
generate pattern D a i [ with support count D a i . support count ;
(6)
construct
's conditional pattern base and then
's conditional FP tree Tree ;
(7)
if Tree 6D; then
call FPgrowth ( Tree ,
(8)
); g
Figure 6.9 FP-growth algorithm for discovering frequent itemsets without candidate generation.
(i.e., f item : TID set g/
, where item is an item name, and TID set is the set of transaction
identifiers containing the item. This is known as the vertical data format .
In this subsection, we look at how frequent itemsets can also be mined effi-
ciently using vertical data format, which is the essence of the Eclat (Equivalence Class
Transformation) algorithm.
Example 6.6 Mining frequent itemsets using the vertical data format. Consider the horizontal
data format of the transaction database, D , of Table 6.1 in Example 6.3. This can be
transformed into the vertical data format shown in Table 6.3 by scanning the data
set once.
Mining can be performed on this data set by intersecting the TID sets of every pair
of frequent single items. The minimum support count is 2. Because every single item is
 
Search WWH ::




Custom Search