Negative Association Rules - Frequent Pattern Mining

Database Reference

In-Depth Information

where n is the total number of transactions. The chance threshold is calculated

independently for each candidate itemset:

min t

i = t

chance ( n , s ( X ), s ( Y ), p )

Pcc ( i

n , s ( X ), s ( Y ))

≥

(6.4)

i = 0

Normally, for a positive association, p -value is set to be very high (usually 0.9999),

on the other hand, for a negative association, p -value is set to be very low (usually

0.001). The whole algorithm develops in an iterative way with rule generation and

rule pruning. An itemset with a support greater than the positive chance threshold

is considered for positive rule generation, while itemset with a support less than the

negative chance threshold is considered for negative rule generation. In this way, the

algorithm discovers three different types of negative association rules in the form of

⇒¬

⇒

⇒¬

⇒

Y ,

Y and

Y . The first two types X

Y ,

Y can be

generated from the negative itemsets if the rule X

Y satisfies the negative chance

threshold and minimum confidence threshold. On the other hand, the rules in the

form of

⇒

¬ X ⇒¬ Y are derived from the positive itemsets if they meet the positive

chance threshold and minimum confidence threshold.

Kingfisher [ 12 , 13 ] is an algorithm developed to discover positive and negative

dependency rules. The dependency rule can be formulated on the basis of association

rule, that the association rule X

P ( X ) P ( Y ). The dependency is positive, if P ( X , Y ) >P ( X ) P ( Y ); and negative,

if P ( X , Y ) <P ( X ) P ( Y ). Otherwise, the rule is an independent rule. The author

concentrated on a specific type of dependency rules, the rules with only one single

consequent attribute. It can be noticed that the negative dependency for the rules

⇒

Y is defined as a dependency rule if P ( X , Y )

⇒

Y or

⇒¬

Y are the same as the positive dependency for the rules X

⇒¬

and

⇒

Y , therefore, it is enough to only consider the positive dependency rules

⇒¬

Y or

⇒

Y . The statistical dependency of the rule X

⇒

Y , is measured

by Fisher's exact test, the p -value, can be calculated:

s ( XY = y ) + i

s ( ¬ XY = y ) + i

s ( Y

min { s ( XY = y ), s ( ¬ X , Y = y ) }

s ( X )

s ( ¬ X )

p F ( X

⇒

y )

(6.5)

i = 0

where y

denotes the presence or absent of Y , and n is the total number of

transactions. It can also be observed that p F ( X

∈{

0, 1

}

⇒¬

Y )

p F (

⇒

Y ), therefore,

it is enough to consider the negative rules in the form of X

Y . An important task

of rule mining is to find the non-redundant rules. Rules are considered as redundant

when they do not add new information to the remaining rules. In order to reduce

the number of discovered rules, Kingfisher focused on finding non-redundant rules.

The rule X

⇒¬

⇒

y is non-redundant, if there does not exist any rules in the form

of X

y such that X

X and p F ( X

y ),

otherwise, the rule is considered as redundant. However, the statistical dependency

is not a monotonic property, it is impossible to do some frequency-based pruning

as Apriori-like algorithms. A straightforward solution is to list all possible negative

rules in the form of X ⇒¬ Y in the whole search space via an enumeration tree, and

⇒

y ) <p F ( X

⇒

Frequent Pattern Mining

Search WWH ::

Custom Search

Home