Database Reference
In-Depth Information
considered fairly high for both rules. Although we may reject the first rule on the
confidence basis, the second rule seems a valid rule and may be considered in the
data analysis. However, when a statistical significance test is considered, such as
statistical correlation between the SM and CM items, one would find that the two
items are actually negatively correlated. This shows that the rule “CM
SM” is
misleading. This example shows not only the importance of considering negative
association rules, but also the importance of statistical significance of the patterns
discovered.
The problem of finding negative association rules is complex and computationally
intensive as discussed in Sect. 2 . A common solution to deal with the complexity
is to focus the search on special cases of interest. Some techniques employ domain
knowledge to guide the search, some are focusing on a certain type of rules of in-
terest, while others are considering interestingness measures to mine for statistically
significant patterns. We give more details about some approaches that have been
proposed in the literature for mining association rules with negations.
Brin et al. [ 8 ] mentioned for the first time the notion of negative relationships in
the literature. They proposed to use the chi-square test between two itemsets. The
statistical test verifies the independence between the two itemsets. To determine the
nature (positive or negative) of the relationship, a correlation metric is used. The
negative association rules that could be discovered based on these measures are the
following:
¬
X
Y , X
⇒¬
Y and
¬
X
⇒¬
Y . One limitation for this method is
that the computation of the χ 2
measure can become expensive in large and dense
datasets.
Aggarwal and Yu [ 2 , 3 ] introduced a new method for finding interesting itemsets
in data. Their method is based on mining strongly collective itemsets. The collective
strength of an itemset I is defined as follows:
1
v ( I )
E [ v ( I )]
v ( I )
C ( I )
=
E [ v ( I )] ×
(6.1)
1
where v ( I ) is the violation rate of an itemset I and it is the fraction of violations over
the entire set of transactions and E [ v ( i )] is its expected value. An itemset I is in a
violation of a transaction if only a subset of its items appear in that transaction. The
collective strength ranges from 0 to
, where a value of 0 means that the items are
perfectly negatively correlated and a value of
means that the items are perfectly
positively correlated. A value of 1 indicates that the value is exactly the same as its
expected value, meaning statistical independence. The advantage of mining itemsets
with collective strength is that the method finds statistical significant patterns. In
addition, this model has good computational efficiency, thus being a good method in
mining dense datasets. This property, along with the symmetry of collective strength
measure, makes this method a good candidate for mining negative association rules
in data.
In [ 19 ] the authors present a new idea to mine strong negative rules. They combine
positive frequent itemsets with domain knowledge in the form of a taxonomy to mine
negative associations. The idea is to reduce the search space, by constraining the
search to the positive patterns that pass the minimum support threshold. When all the
 
Search WWH ::




Custom Search