Database Reference
In-Depth Information
5.2 Apriori Algorithm
The Apriori algorithm takes a bottom-up iterative approach to uncovering the
frequent itemsets by first determining all the possible items (or 1-itemsets, for
example {bread} , {eggs} , {milk} , …) and then identifying which among them
are frequent.
Assuming the minimum support threshold (or the minimum support criterion) is
set at 0.5, the algorithm identifies and retains those itemsets that appear in at least
50% of all transactions and discards (or “prunes away”) the itemsets that have a
support less than 0.5 or appear in fewer than 50% of the transactions. The word
prune is used like it would be in gardening, where unwanted branches of a bush
are clipped away.
In the next iteration of the Apriori algorithm, the identified frequent 1-itemsets
are paired into 2-itemsets (for example, {bread,eggs} , {bread,milk} ,
{eggs,milk} , …) and again evaluated to identify the frequent 2-itemsets among
them.
At each iteration, the algorithm checks whether the support criterion can be met;
if it can, the algorithm grows the itemset, repeating the process until it runs out of
support or until the itemsets reach a predefined length. The Apriori algorithm [5] is
given next. Let variable be the set of candidate k -itemsets and variable be the
set of k -itemsets that satisfy the minimum support. Given a transaction database
, a minimum support threshold , and an optional parameter indicating the
maximum length an itemset could reach, Apriori iteratively computes frequent
itemsets
based on .
1 Apriori ( , , )
2
3 {1-itemsets that satisfy minimum support }
4 while
5 if
6 ← candidate itemsets generated from
7 for each transaction
in database
do
8 increment the counts of
contained in
9
candidates in
that satisfy minimum support
Search WWH ::




Custom Search