Database Reference
In-Depth Information
Thus, we define the interest of an association rule I j to be the difference between its
confidence and the fraction of baskets that contain j . That is, if I has no influence on j , then
we would expect that the fraction of baskets including I that contain j would be exactly the
same as the fraction of all baskets that contain j . Such a rule has interest 0. However, it is
interesting, in both the informal and technical sense, if a rule has either high interest, mean-
ing that the presence of I in a basket somehow causes the presence of j , or highly negative
interest, meaning that the presence of I discourages the presence of j .
EXAMPLE 6.3 The story about beer and diapers is really a claim that the association rule
{ diapers } → beer has high interest. That is, the fraction of diaper-buyers who buy beer is
significantly greater than the fraction of all customers that buy beer. An example of a rule
with negative interest is { coke } → pepsi . That is, people who buy Coke are unlikely to buy
Pepsi as well, even though a good fraction of all people buy Pepsi - people typically prefer
one or the other, but not both. Similarly, the rule { pepsi } → coke can be expected to have
negative interest.
For some numerical calculations, let us return to the data of Fig. 6.1 . The rule { dog }
cat has confidence 5/7, since “dog” appears in seven baskets, of which five have “cat.”
However, “cat” appears in six out of the eight baskets, so we would expect that 75% of the
seven baskets with “dog” would have “cat” as well. Thus, the interest of the rule is 5/7 −
3/4 = −0 . 036, which is essentially 0. The rule { cat } → kitten has interest 1/6 − 1/8 = 0 . 042.
The justification is that one out of the six baskets with “cat” have “kitten” as well, while
“kitten” appears in only one of the eight baskets. This interest, while positive, is close to 0
and therefore indicates the association rule is not very “interesting.”
6.1.4
Finding Association Rules with High Confidence
Identifying useful association rules is not much harder than finding frequent itemsets. We
shall take up the problem of finding frequent itemsets in the balance of this chapter, but
for the moment, assume it is possible to find those frequent itemsets whose support is at
or above a support threshold s . If we are looking for association rules I j that apply to a
reasonable fraction of the baskets, then the support of I must be reasonably high. In prac-
tice, such as for marketing in brick-and-mortar stores, “reasonably high” is often around
1% of the baskets. We also want the confidence of the rule to be reasonably high, perhaps
50%, or else the rule has little practical effect. As a result, the set I { j } will also have
fairly high support.
Suppose we have found all itemsets that meet a threshold of support, and that we have
the exact support calculated for each of these itemsets. We can find within them all the as-
sociation rules that have both high support and high confidence. That is, if J is a set of n
Search WWH ::




Custom Search