Database Reference
In-Depth Information
where n is the total number of transactions. The chance threshold is calculated
independently for each candidate itemset:
min t
p
i = t
chance ( n , s ( X ), s ( Y ), p )
=
|
Pcc ( i
|
n , s ( X ), s ( Y ))
(6.4)
i = 0
Normally, for a positive association, p -value is set to be very high (usually 0.9999),
on the other hand, for a negative association, p -value is set to be very low (usually
0.001). The whole algorithm develops in an iterative way with rule generation and
rule pruning. An itemset with a support greater than the positive chance threshold
is considered for positive rule generation, while itemset with a support less than the
negative chance threshold is considered for negative rule generation. In this way, the
algorithm discovers three different types of negative association rules in the form of
X
⇒¬
¬
¬
⇒¬
⇒¬
¬
Y ,
X
Y and
X
Y . The first two types X
Y ,
X
Y can be
generated from the negative itemsets if the rule X
Y satisfies the negative chance
threshold and minimum confidence threshold. On the other hand, the rules in the
form of
¬ X ⇒¬ Y are derived from the positive itemsets if they meet the positive
chance threshold and minimum confidence threshold.
Kingfisher [ 12 , 13 ] is an algorithm developed to discover positive and negative
dependency rules. The dependency rule can be formulated on the basis of association
rule, that the association rule X
=
P ( X ) P ( Y ). The dependency is positive, if P ( X , Y ) >P ( X ) P ( Y ); and negative,
if P ( X , Y ) <P ( X ) P ( Y ). Otherwise, the rule is an independent rule. The author
concentrated on a specific type of dependency rules, the rules with only one single
consequent attribute. It can be noticed that the negative dependency for the rules
X
Y is defined as a dependency rule if P ( X , Y )
Y or
¬
X
⇒¬
Y are the same as the positive dependency for the rules X
⇒¬
Y
and
¬
X
Y , therefore, it is enough to only consider the positive dependency rules
X
⇒¬
Y or
¬
X
Y . The statistical dependency of the rule X
Y , is measured
by Fisher's exact test, the p -value, can be calculated:
s ( XY = y ) + i
s ( ¬ XY = y ) + i
n
s ( Y
min { s ( XY = y ), s ( ¬ X , Y = y ) }
s ( X )
s ( ¬ X )
p F ( X
Y
=
y )
=
y )
(6.5)
=
i = 0
where y
denotes the presence or absent of Y , and n is the total number of
transactions. It can also be observed that p F ( X
∈{
0, 1
}
⇒¬
Y )
=
p F (
¬
X
Y ), therefore,
it is enough to consider the negative rules in the form of X
Y . An important task
of rule mining is to find the non-redundant rules. Rules are considered as redundant
when they do not add new information to the remaining rules. In order to reduce
the number of discovered rules, Kingfisher focused on finding non-redundant rules.
The rule X
⇒¬
Y
=
y is non-redundant, if there does not exist any rules in the form
of X
y such that X
X and p F ( X
y ),
otherwise, the rule is considered as redundant. However, the statistical dependency
is not a monotonic property, it is impossible to do some frequency-based pruning
as Apriori-like algorithms. A straightforward solution is to list all possible negative
rules in the form of X ⇒¬ Y in the whole search space via an enumeration tree, and
Y
=
Y
=
y ) <p F ( X
Y
=
Search WWH ::




Custom Search