Databases Reference
In-Depth Information
6.2.2 Generating Association Rules from Frequent Itemsets
Once the frequent itemsets from transactions in a database D have been found, it is
straightforward to generate strong association rules from them (where strong associa-
tion rules satisfy both minimum support and minimum confidence). This can be done
using Eq. (6.4) for confidence, which we show again here for completeness:
support count
.
A [ B
/
confidence
.
A ) B
/D P
.
B j A
/D
.
support count
.
A
/
The conditional probability is expressed in terms of itemset support count, where
support count
.
A [ B
/
is the number of transactions containing the itemsets A [ B , and
support count
is the number of transactions containing the itemset A . Based on this
equation, association rules can be generated as follows:
.
A
/
For each frequent itemset l , generate all nonempty subsets of l .
support count
.
l
/
For every nonempty subset s of l , output the rule “ s ).
l s
/
” if
support count
.
s
/
min conf , where min conf is the minimum confidence threshold.
Because the rules are generated from frequent itemsets, each one automatically satis-
fies the minimum support. Frequent itemsets can be stored ahead of time in hash tables
along with their counts so that they can be accessed quickly.
Example 6.4 Generating association rules. Let's try an example based on the transactional data for
AllElectronics shown before in Table 6.1. The data contain frequent itemset X DfI1, I2,
I5g. What are the association rules that can be generated from X ? The nonempty subsets
of X are fI1, I2g, fI1, I5g, fI2, I5g, fI1g, fI2g, and fI5g. The resulting association rules are
as shown below, each listed with its confidence:
fI1, I2g) I5,
confidence D 2
=
4 D 50%
fI1, I5g) I2,
confidence D 2
=
2 D 100%
fI2, I5g) I1,
confidence D 2
=
2 D 100%
I1 )fI2, I5g,
confidence D 2
=
6 D 33%
I2 )fI1, I5g,
confidence D 2
=
7 D 29%
I5 )fI1, I2g,
confidence D 2
=
2 D 100%
If the minimum confidence threshold is, say, 70%, then only the second, third, and
last rules are output, because these are the only ones generated that are strong. Note
that, unlike conventional classification rules, association rules can contain more than
one conjunct in the right side of the rule.
6.2.3 Improving the Efficiency of Apriori
“How can we further improve the efficiency of Apriori-based mining?” Many variations of
the Apriori algorithm have been proposed that focus on improving the efficiency of the
original algorithm. Several of these variations are summarized as follows:
 
Search WWH ::




Custom Search