Mining Frequent Patterns, Associations, and Correlations: Basic Concepts and Methods - Data Mining: Concepts and Techniques - page 260

Databases Reference

In-Depth Information

Algorithm : FP growth. Mine frequent itemsets using an FP-tree by pattern fragment growth.

Input :

D , a transaction database;

min sup , the minimum support count threshold.

Output : The complete set of frequent patterns.

Method :

1. The FP-tree is constructed in the following steps:

(a) Scan the transaction database D once. Collect F , the set of frequent items, and their

support counts. Sort F in support count descending order as L , the list of frequent items.

(b) Create the root of an FP-tree, and label it as “null.” For each transaction Trans in D do the

following.

Select and sort the frequent items in Trans according to the order of L . Let the sorted

frequent item list in Trans be [ p j P ], where p is the first element and P is the remaining

list. Call insert tree .

, which is performed as follows. If T has a child N such that

N.item-name D p.item-name , then increment N 's count by 1; else create a new node N ,

and let its count be 1, its parent link be linked to T , and its node-link to the nodes with

the same item-name via the node-link structure. If P is nonempty, call inserttree . P , N /

recursively.

2. The FP-tree is mined by calling FPgrowth . FP tree , null /, which is implemented as follows.

procedureFPgrowth ( Tree, )

(1)

[ p j P ], T

/

if Tree contains a single path P then

(2)

for each combination (denoted as ) of the nodes in the path P

(3)

generate pattern [ with support count = minimum support count of nodes in ;

(4)

else for each a i in the header of Tree f

(5)

generate pattern D a i [ with support count D a i . support count ;

(6)

construct

's conditional pattern base and then

's conditional FP tree Tree ;

(7)

if Tree 6D; then

call FPgrowth ( Tree ,

(8)

); g

Figure 6.9 FP-growth algorithm for discovering frequent itemsets without candidate generation.

(i.e., f item : TID set g/

, where item is an item name, and TID set is the set of transaction

identifiers containing the item. This is known as the vertical data format .

In this subsection, we look at how frequent itemsets can also be mined effi-

ciently using vertical data format, which is the essence of the Eclat (Equivalence Class

Transformation) algorithm.

Example 6.6 Mining frequent itemsets using the vertical data format. Consider the horizontal

data format of the transaction database, D , of Table 6.1 in Example 6.3. This can be

transformed into the vertical data format shown in Table 6.3 by scanning the data

set once.

Mining can be performed on this data set by intersecting the TID sets of every pair

of frequent single items. The minimum support count is 2. Because every single item is

Next Page

Data Mining: Concepts and Techniques

Search WWH ::

Custom Search

Home