Efficiently Identifying Exploratory Rules’ Significance - Data Mining: Theory, Methodology, Techniques, and Applications

Database Reference

In-Depth Information

descriptions of the interrelationship between quantitative attributes and quali-

tative attributes.

Here are some notions of exploratory rule discovery that we are to use in this

paper:

1. A dataset is a finite set of records

2. For propositional rule discovery, a record is an element to which we apply

Boolean predicates called conditions, while for distributional-consequent rule

discovery, a record is a

pair < c, v >

,where

is the nonempty set of Boolean

is a set of values for the quantitative variables in whose

distribution the users are interested.

3. A rule is in the form of

conditions, and

are

conjunctions of Boolean conditions. The status of such rule is described by

interestingness measures like the support and the confidence . Contrarily, for

distributional-consequent rule discovery,

A → C

. For propositional rules, both

and

is a conjunction of Boolean con-

ditions while

is a nonempty set of target quantitative variables in which

the users are interested. The quantitative variables are described by distri-

butional statistics. We prefer using

to denote a distributional-

consequent rule instead, for the purpose of avoiding confusion.

4. Rule

A → target

1, then the

second rule is a direct parent of the first one, otherwise, it is a grandparent

of the first rule.

5. We use the notion

A → C

is a parent of

B → C

A ⊂ B

.If

|A|

|B|−

is a conjunction of conditions, to

represent the set of records that satisfy the condition (or set of conditions)

coverset

(

), where

. If a record

is in

coverset

(

), we say that

is covered by

.If

∅

coverset

(

) includes all the records in the database.

Coverage

(

) is the number of records covered by

coverage

(

|coverset

(

)

Insignificant Exploratory Rules

As is mentioned before, exploratory rule discovery searches for multiple models

in a database, and may lead to discovering spurious or uninteresting rules. How

to decrease the number of resulting rules becomes a problem of concern. One

approach is up to the users to define a suitable set of constraints which may

be utilized so that the algorithm can automatically discard some potentially

uninteresting rules. Another approach is to perform comparison within resulting

rules, so as to present the users with a more compact set of models. Techniques

regarding automatically removing potentially uninteresting rules are summarized

by Huang and Webb [8].

3.1

Improvement

Filtering insignificant rules using statistical tests is one of the interesting

topics of research. By using this technique we perform significance tests among

rules and discard those which happen to appear interesting only by chance. To

Data Mining: Theory, Methodology, Techniques, and Applications

Search WWH ::

Custom Search

Home