Databases Reference
In-Depth Information
Algorithm: Apriori.
Find frequent itemsets using an iterative level-wise approach based
on candidate generation.
Input:
D
, a database of transactions;
min sup
, the minimum support count threshold.
Output:
L
, frequent itemsets in
D
.
Method:
(1)
L
1
= find frequent 1-itemsets(D);
(2)
for
(
k
D 2;
L
k
1
6D
;
k
CC) f
C
k
=
apriorigen
(
L
k
1
);
(3)
(4)
for each
transaction
t
2
D
f // scan
D
for counts
(5)
C
t
= subset
.
C
k
,
t
/
; // get the subsets of
t
that are candidates
(6)
for each
candidate
c
2
C
t
(7)
c.countCC;
(8)
g
(9)
L
k
Df
c
2
C
k
j
c
.
count
min sup
g
(10) g
(11)
return
L
D[
k
L
k
;
procedureapriorigen
(
L
k
1
:frequent .
k
1/-itemsets)
(1)
for each
itemset
l
1
2
L
k
1
(2)
for each
itemset
l
2
2
L
k
1
(3)
if
(
l
1
[1] D
l
2
[1]/^.
l
1
[2] D
l
2
[2]/
^...^.
l
1
[
k
2] D
l
2
[
k
2]/^.
l
1
[
k
1] <
l
2
[
k
1]/
then
f
(4)
c
D
l
1
1
l
2
; // join step: generate candidates
(5)
if
hasinfrequentsubset
(
c
,
L
k
1
)
then
(6)
delete
c
; // prune step: remove unfruitful candidate
(7)
else add
c
to
C
k
;
(8) g
(9)
return
C
k
;
procedurehasinfrequentsubset
(
c
: candidate
k
-itemset;
L
k
1
: frequent
.
k
1
/
-itemsets); // use prior knowledge
(1)
-subset
s
of
c
(2)
if
s
62
L
k
1
then
(3)
return
TRUE;
(4)
for each
.
k
1
/
return
FALSE;
Figure 6.4
Apriori algorithm for discovering frequent itemsets for mining Boolean association rules.
A procedure can then be called to generate association rules from the frequent itemsets.
Such a procedure is described in Section 6.2.2.
The
apriorigen
procedure performs two kinds of actions, namely,
join
and
prune
, as
described before. In the join component,
L
k
1
is joined with
L
k
1
to generate potential
candidates (steps 1-4). The prune component (steps 5-7) employs the Apriori property
to remove candidates that have a subset that is not frequent. The test for infrequent
subsets is shown in procedure
hasinfrequentsubset
.