Information Technology Reference
In-Depth Information
Note that among the four parameters confidence and support are used
frequently.
12.2 The Apriori Algorithm
The association rule mining is to discover the rules with the minium support
(minsup) and the minium confidence (minconf) defined by users from transaction
databases. The problem of association rule mining can be decomposed into the
following two subproblem.
(1) Find all large itemsets (or frequent itemsets) in a transaction database. The
large itemset is defined as the itemset
X
with support(
X
) no less than the minsup
defined by user.
(2) Generate association rules from large itemset. For every large itemset
A
, if
B A B
¬ and confidence(
B
¼ (
A B
))≥ minconf then constuct the rule
B
¼
(
).
The second subproblem is easier than the first one. Most research works focus
on the first subproblem. So we introduce the algorithm which is concerned with
the first subproblem.
The famous Apriori algorithm is concentrate on mining frequent itemsets for
Boolean association rules(Agrawal 1993, Agrawal, 1994 ). The name of the
algorithm is based on the fact that the algorithm uses prior knowledge of frequent
itermset properties. Apriori employs breadth-first search, where
A B
k
-itemsets are
used to explore (
+1)-itemsets. First, the set of frequent 1-itemsets is found by
scanning the database to accumulate the count for each item, and collecting those
items that satisfy minimum support. The resulting set is denoted
k
L 1 . Next,
L
1 is
used to find
L 2 , the set of frequent 2-itemsets, which is used to find
L
3 , and so on,
until no more frequent
k
-itemsets can be found. We give the classical Apriori
algorithm as follows.
Algorithm 12.1 Apriori Algorithm(Agrawal, 1994).
Input: DB, a database of transactions;
minsup, the minimum support count threshold.
Output: L , frequent itemsets in DB
Method:
Search WWH ::




Custom Search