Database Reference
In-Depth Information
Table 6.2 Example 1 data
CM row
CM
¬
SM
20
60
80
¬
SM
20
0
20
col
40
60
100
2. The length of the transactions in the database increases dramatically when nega-
tive items are considered. Picture the length of the transaction in a market basket
analysis example where all products in a store have to be considered in each
transaction. For example, to a basket where bread and milk are bought (i.e. milk
and bread are the positive items), all the other products in the store become part
of the transaction as negative items.
3. The total number of association rules that can be discovered when negative items
are considered is 5 d
3 d
1. A detailed calculation for the formula can be
found in [ 18 ]. The number of association rules for positive items in a transactions
is 3 d
2
×
+
2 d + 1
1. For our small example, it means that we can find up to 180
positive rules and up to 2640 when the negative items are considered as well.
4. The number of candidate itemsets is reduced when mining positive association
rules by the support based pruning. This property is no longer efficient in a
transactional database that is augmented with the negative items. Given that the
support of a negative item is s (
+
s ( i k ), either the negative or the positive
item will have a big enough support to pass the minimum support threshold.
Given the reasons above, the traditional association rule mining algorithms can not
cope with mining rules when negative items are considered. This is the reason new
algorithms are needed to efficiently mine association rules with negative items. Here
we survey algorithms that efficiently mine some variety of negative associations from
data.
¬
i k )
=
1
3
Current Approaches
In this section we present current approaches proposed in the literature to discover
negative association rules. We illustrate in Example 1 how rules discovered in the
support confidence framework could be misleading sometimes and how the negative
associations discovered in data can shed a new light on the discovered patterns.
Example 1 Let us consider an example from market basket data. In this example
we want to study the purchase of cow's milk (CM) versus soy milk (SM) in a grocery
store. Table 6.2 gives us the data collected from 100 baskets in the store. In Table 6.2
“CM” means the basket contains cow's milk and “
¬
CM” means the basket does not
contain cow's milk. The same applies for soy milk.
In this data, let us find the positive association rules in the “support-confidence”
framework. The association rule “SM
CM” has 20 % support and 25 % confidence
(support(SM
CM)/support(SM)). The association rule “CM
SM” has 20 %
support and 50 % confidence (support(SM
CM)/support(CM)). The support is
 
Search WWH ::




Custom Search