Mining Frequent Patterns, Associations, and Correlations: Basic Concepts and Methods - Data Mining: Concepts and Techniques - page 253

Databases Reference

In-Depth Information

Algorithm: Apriori. Find frequent itemsets using an iterative level-wise approach based

on candidate generation.

Input:

D , a database of transactions;

min sup , the minimum support count threshold.

Output: L , frequent itemsets in D .

Method:

(1)

L 1 = find frequent 1-itemsets(D);

(2)

for ( k D 2; L k 1 6D

; k CC) f

C k = apriorigen ( L k 1 );

(3)

(4)

for each transaction t 2 D f // scan D for counts

(5)

C t = subset

.

C k , t

/

; // get the subsets of t that are candidates

(6)

for each candidate c 2 C t

(7)

c.countCC;

(8)

g

(9)

L k Df c 2 C k j c . count min sup g

(10) g

(11) return L D[ k L k ;

procedureapriorigen ( L k 1 :frequent . k 1/-itemsets)

(1) for each itemset l 1 2 L k 1

(2) for each itemset l 2 2 L k 1

(3) if ( l 1 [1] D l 2 [1]/^. l 1 [2] D l 2 [2]/

^...^. l 1 [ k 2] D l 2 [ k 2]/^. l 1 [ k 1] < l 2 [ k 1]/ then f

(4) c D l 1 1 l 2 ; // join step: generate candidates

(5) if hasinfrequentsubset ( c , L k 1 ) then

(6) delete c ; // prune step: remove unfruitful candidate

(7) else add c to C k ;

(8) g

(9) return C k ;

procedurehasinfrequentsubset ( c : candidate k -itemset;

L k 1 : frequent

. k 1

/

-itemsets); // use prior knowledge

(1)

-subset s of c

(2) if s 62 L k 1 then

(3) return TRUE;

(4)

for each .

k 1

/

return FALSE;

Figure 6.4 Apriori algorithm for discovering frequent itemsets for mining Boolean association rules.

A procedure can then be called to generate association rules from the frequent itemsets.

Such a procedure is described in Section 6.2.2.

The apriorigen procedure performs two kinds of actions, namely, join and prune , as

described before. In the join component, L k 1 is joined with L k 1 to generate potential

candidates (steps 1-4). The prune component (steps 5-7) employs the Apriori property

to remove candidates that have a subset that is not frequent. The test for infrequent

subsets is shown in procedure hasinfrequentsubset .

Next Page

Data Mining: Concepts and Techniques

Search WWH ::

Custom Search

Home