Database Reference
In-Depth Information
appropriate tables over which the algorithms can be applied. For example,
the Northwind data warehouse is an appropriate source of data to find
associations between items. Conceptually, the
Sales
fact table contains data
of all customer purchases. Further, the
OrderNo
and
OrderLineNo
can be used
to obtain the items that are purchased together. With this information, the
Northwind management can discover which are the items that are frequently
ordered together. We show a concrete example in Sect.
9.1.7
.
Let
I
=
{i
1
,i
2
,...,i
m
}
be a set of literals, called items. A set of items
is called an
itemset
.Let
be a set of transactions, where each transaction
T
is an itemset such that
T ⊆I
D
.Letalso
X
be an itemset. A transaction
T
is said to contain
X
if and only if
X ⊆ T
.An
association rule
is an
implication of the form
X ⇒ Y
,where
X ⊂I,Y ⊂I
,and
X ∩ Y
=
∅
.The
rule
X ⇒ Y
holds in
D
with
confidence
c
,if
c
% of the transactions in
D
that contain
X
also contain
Y
.Therule
X ⇒ Y
has
support
s
in
D
if
s
%
of the transactions in
contain
X ∪ Y
.
We illustrate the problem with a simple but typical example. Consider
the following table containing a collection of transactions corresponding to
purchases in a supermarket. A transaction is identified by a
TransactionID
attribute and a list of items included in the transaction.
D
TransactionId
Items
1000
{
1,2,3
}
2000
{
1,3
}
3000
{
1,4
}
4000
{
2,5,6
}
3
with support 50% and confidence 66%, which means half of the transactions
include items
1
and
3
, and from the three transactions that include item
1
,
two of them include item
3
.Thatis,
c
=
4
and
s
=
3
. Analogously, the rule
3
From this data set, we can obtain at least the following rules. First,
1
⇒
1
also holds with support 50% (for the same reason of the previous rule)
and confidence 100%. The latter is because all the transactions containing
item
3
also contain item
1
.
In general, algorithms for finding association rules consist in two main
stages:
⇒
1. Generate the
frequent itemsets
, which finds all the itemsets that satisfy
a minimum support (
minsup
) threshold.
2. Generate the association rules, which extracts all the high-confidence rules
from the frequent itemsets found in the previous step. These are called
strong rules
.
The first operation above is critical since generating the rules once the
frequent itemsets have been found is straightforward.
Another concept worth to be commented is the one of
interesting rules
.
We say an association rule
A ⇒ B
is
interesting
if its confidence exceeds a
Search WWH ::
Custom Search