Information Technology Reference
In-Depth Information
(1)
L
1
= find_frequent_1-itemsets(DB);
(2) for (
k
=2;
L
k-1
≠∅;
k
++){
(3) C
k
= apriori_gen(
L
k-1
);
(4) for each transaction
T
∈
DB{ //scan DB for counts
(5)
C
T
= subset(C
k
,
T
); //get the subsets of
T
that are candidates
(6)
for each candidate
c
∈
C
T
(7)
c
.count++
(8)
}
(9)
L
k
=
{
c
∈
C
k
|
c
.count
≥
minsup}
(10)
}
(11) return
L =
⋃
k
L
k
Procedure
apriori
_
gen(
L
k-1
: frequent(
k
-1)-itemsets)
(1) for each itemset
l
1
∈
L
k-1
(2)
for each itemset
l
2
∈
L
k-1
(3) if(
l
1
[1]=
l
2
[1])
⋀(
l
1
[2]=
l
2
[2])
⋀⋯ ⋀(
l
1
[
k
-2]=
l
2
[
k
-2])
⋀(
l
1
[
k
-1]=
l
2
[
k
-1])then{
(4)
c
=
l
1
⋈
l
2
; //join step: generate candidates
(5) if has_infrenquent_subset(
c
,
L
k-1
) then
(6) delete
c
; //prune step: remove unfruitful candidate
(7) else add
c
to
C
k
;
(8) }
(9) return
C
k.
Procedure has_infrequent_subset(
c:
candidate
k-
itemset;
-1)-itemsets; //use prior knowledge
(1) for each (
k
-1)-subset
s
of
c
L
k-1
: frequent(
k
(2) if
s
∉
L
k
+1
then
(3) return TRUE;
(4) return FALSE.
-itemsets in the k
times. If the number of element in the top itemset is k at most, then the algorithm
will scan DB
Apriori algorithm scans DB several times and calculates
k
+1 time. Hash approach was used to improve Apriori
algorithm. It minus the verified record
k
times, or
k
C
k
by the
means of reducing the number of candidate item sets, reducing the length of
T
to support the calculation of