Efficiently Identifying Exploratory Rules’ Significance - Data Mining: Theory, Methodology, Techniques, and Applications

Database Reference

In-Depth Information

provide a clear idea of insignificant rules, we will at first introduce the concept of

rule improvement defined by Bayardo et al. [5]. Confidence improvement which

is used as an example, defined a minimum improvement in confidence that a

propositional rule must exhibit in order to be regarded as potentially interesting:

∀A ⊂ A, conf idence

imp

(

A → C

)=

min

(

A → C

)

A → C

− conf idence

(

))

It is argued that setting a minimum improvement is desirable in discarding

potentially uninteresting exploratory rules. However, the values used for com-

parison are derived from samples instead of from the total population. There

is the problem that the observed improvement provides only an estimate of the

true improvement, and if no account is taken of the quality of that estimate, so

it is likely to result in poor decisions.

Rule filtering techniques regarding the significance of rules concern about the

statistically significance of the improvement, rather than the values of interest-

ingness measures. Statistical tests are done with resulting rules and those within

expectation (or without enough surprisingness) are automatically removed. Such

techniques may lead to type-1 error, which result in accepting spurious or un-

interesting rules and type-2 error, which result in rejecting rules that are not

spurious. A technique for statistically sound exploratory rule discovery is pro-

posed by Webb [15] using a holdout set to validate the resulting rules.

3.2

Statistical Significance of Rules

Chi-square test is a widely used test for identifying propositional rule indepen-

dence. Liu et al. [10] did research on association rules with a fixed attribute as

consequent. They used a chi-square test to decide whether the antecedent of a

rule is independent from its consequent or not, accepting only rules whose an-

tecedent and consequent are positively correlated, thus, discarding rules which

happen to appear interesting by chance. The rules discarded by using an inde-

pendent test are referred to as insignificant rules.

Consider the following Boolean-consequent rules:

A → C

[

support

= 60%

,confidence

= 90%]

A

&

B → C

[

support

= 45%

,confidence

= 91%]

A

D → C

support

,confidence

&

[

= 46%

= 70%]

There is a high possibility that the conditions

B

and

C

are conditionally in-

dependent given

, thus the second rule provides little interesting information.

According to Liu et al., the third rule does not bear interesting information,

either. It should also be discarded, because the condition

A

D

is negatively corre-

lated to condition

. Bay and Pazzani [4] also made use of Chi-square

test to decide the significance of contrast sets . Webb [15] proposed a statistically

sound technique for filtering insignificant rules, using the Fisher exact test and

aholdoutset.

C

,given

A

Data Mining: Theory, Methodology, Techniques, and Applications

Search WWH ::

Custom Search

Home