Negative Association Rules - Frequent Pattern Mining

Database Reference

In-Depth Information

considered fairly high for both rules. Although we may reject the first rule on the

confidence basis, the second rule seems a valid rule and may be considered in the

data analysis. However, when a statistical significance test is considered, such as

statistical correlation between the SM and CM items, one would find that the two

items are actually negatively correlated. This shows that the rule “CM

SM” is

misleading. This example shows not only the importance of considering negative

association rules, but also the importance of statistical significance of the patterns

discovered.

The problem of finding negative association rules is complex and computationally

intensive as discussed in Sect. 2 . A common solution to deal with the complexity

is to focus the search on special cases of interest. Some techniques employ domain

knowledge to guide the search, some are focusing on a certain type of rules of in-

terest, while others are considering interestingness measures to mine for statistically

significant patterns. We give more details about some approaches that have been

proposed in the literature for mining association rules with negations.

Brin et al. [ 8 ] mentioned for the first time the notion of negative relationships in

the literature. They proposed to use the chi-square test between two itemsets. The

statistical test verifies the independence between the two itemsets. To determine the

nature (positive or negative) of the relationship, a correlation metric is used. The

negative association rules that could be discovered based on these measures are the

following:

⇒

¬

X

⇒

Y , X

⇒¬

Y and

¬

X

⇒¬

Y . One limitation for this method is

that the computation of the χ 2

measure can become expensive in large and dense

datasets.

Aggarwal and Yu [ 2 , 3 ] introduced a new method for finding interesting itemsets

in data. Their method is based on mining strongly collective itemsets. The collective

strength of an itemset I is defined as follows:

1

−

v ( I )

E [ v ( I )]

v ( I )

C ( I )

=

− E [ v ( I )] ×

(6.1)

1

where v ( I ) is the violation rate of an itemset I and it is the fraction of violations over

the entire set of transactions and E [ v ( i )] is its expected value. An itemset I is in a

violation of a transaction if only a subset of its items appear in that transaction. The

collective strength ranges from 0 to

, where a value of 0 means that the items are

perfectly negatively correlated and a value of

∞

means that the items are perfectly

positively correlated. A value of 1 indicates that the value is exactly the same as its

expected value, meaning statistical independence. The advantage of mining itemsets

with collective strength is that the method finds statistical significant patterns. In

addition, this model has good computational efficiency, thus being a good method in

mining dense datasets. This property, along with the symmetry of collective strength

measure, makes this method a good candidate for mining negative association rules

in data.

In [ 19 ] the authors present a new idea to mine strong negative rules. They combine

positive frequent itemsets with domain knowledge in the form of a taxonomy to mine

negative associations. The idea is to reduce the search space, by constraining the

search to the positive patterns that pass the minimum support threshold. When all the

∞

Frequent Pattern Mining

Search WWH ::

Custom Search

Home