Database Reference
In-Depth Information
provide a clear idea of insignificant rules, we will at first introduce the concept of
rule improvement defined by Bayardo et al. [5]. Confidence improvement which
is used as an example, defined a minimum improvement in confidence that a
propositional rule must exhibit in order to be regarded as potentially interesting:
∀A ⊂ A, conf idence
imp
(
A → C
)=
min
(
(
A → C
)
A → C
− conf idence
(
))
It is argued that setting a minimum improvement is desirable in discarding
potentially uninteresting exploratory rules. However, the values used for com-
parison are derived from samples instead of from the total population. There
is the problem that the observed improvement provides only an estimate of the
true improvement, and if no account is taken of the quality of that estimate, so
it is likely to result in poor decisions.
Rule filtering techniques regarding the significance of rules concern about the
statistically significance of the improvement, rather than the values of interest-
ingness measures. Statistical tests are done with resulting rules and those within
expectation (or without enough surprisingness) are automatically removed. Such
techniques may lead to type-1 error, which result in accepting spurious or un-
interesting rules and type-2 error, which result in rejecting rules that are not
spurious. A technique for statistically sound exploratory rule discovery is pro-
posed by Webb [15] using a holdout set to validate the resulting rules.
3.2
Statistical Significance of Rules
Chi-square test is a widely used test for identifying propositional rule indepen-
dence. Liu et al. [10] did research on association rules with a fixed attribute as
consequent. They used a chi-square test to decide whether the antecedent of a
rule is independent from its consequent or not, accepting only rules whose an-
tecedent and consequent are positively correlated, thus, discarding rules which
happen to appear interesting by chance. The rules discarded by using an inde-
pendent test are referred to as insignificant rules.
Consider the following Boolean-consequent rules:
A → C
[
support
= 60%
,confidence
= 90%]
A
&
B → C
[
support
= 45%
,confidence
= 91%]
A
D → C
support
,confidence
&
[
= 46%
= 70%]
There is a high possibility that the conditions
B
and
C
are conditionally in-
dependent given
, thus the second rule provides little interesting information.
According to Liu et al., the third rule does not bear interesting information,
either. It should also be discarded, because the condition
A
D
is negatively corre-
lated to condition
. Bay and Pazzani [4] also made use of Chi-square
test to decide the significance of contrast sets . Webb [15] proposed a statistically
sound technique for filtering insignificant rules, using the Fisher exact test and
aholdoutset.
C
,given
A
Search WWH ::




Custom Search