Information Technology Reference
In-Depth Information
(1) L 1 = find_frequent_1-itemsets(DB);
(2) for ( k =2; L k-1 ≠∅;
k ++){
(3) C k = apriori_gen( L k-1 );
(4) for each transaction T
DB{ //scan DB for counts
(5)
C T = subset(C k ,
T
); //get the subsets of
T
that are candidates
(6)
for each candidate c
C T
(7)
c .count++
(8)
}
(9)
L k = { c
C k | c .count minsup}
(10) }
(11) return L =
k L k
Procedure apriori _ gen(
L k-1 : frequent(
k
-1)-itemsets)
(1) for each itemset l 1 L k-1
(2) for each itemset l 2 L k-1
(3) if( l 1 [1]= l 2 [1]) ⋀( l 1 [2]= l 2 [2]) ⋀⋯ ⋀( l 1 [ k -2]= l 2 [ k -2]) ⋀( l 1 [ k -1]= l 2 [ k -1])then{
(4) c = l 1 l 2 ; //join step: generate candidates
(5) if has_infrenquent_subset( c , L k-1 ) then
(6) delete c ; //prune step: remove unfruitful candidate
(7) else add c to C k ;
(8) }
(9) return C k.
Procedure has_infrequent_subset(
c:
candidate
k-
itemset;
-1)-itemsets; //use prior knowledge
(1) for each ( k -1)-subset s of c
L k-1 : frequent(
k
(2) if s L k +1 then
(3) return TRUE;
(4) return FALSE.
-itemsets in the k
times. If the number of element in the top itemset is k at most, then the algorithm
will scan DB
Apriori algorithm scans DB several times and calculates
k
+1 time. Hash approach was used to improve Apriori
algorithm. It minus the verified record
k
times, or
k
C k by the
means of reducing the number of candidate item sets, reducing the length of
T
to support the calculation of
Search WWH ::




Custom Search