Database Reference
In-Depth Information
descriptions of the interrelationship between quantitative attributes and quali-
tative attributes.
Here are some notions of exploratory rule discovery that we are to use in this
paper:
1. A dataset is a finite set of records
2. For propositional rule discovery, a record is an element to which we apply
Boolean predicates called conditions, while for distributional-consequent rule
discovery, a record is a
pair < c, v >
,where
c
is the nonempty set of Boolean
is a set of values for the quantitative variables in whose
distribution the users are interested.
3. A rule is in the form of
conditions, and
v
are
conjunctions of Boolean conditions. The status of such rule is described by
interestingness measures like the support and the confidence . Contrarily, for
distributional-consequent rule discovery,
A → C
. For propositional rules, both
A
and
C
A
is a conjunction of Boolean con-
ditions while
is a nonempty set of target quantitative variables in which
the users are interested. The quantitative variables are described by distri-
butional statistics. We prefer using
C
to denote a distributional-
consequent rule instead, for the purpose of avoiding confusion.
4. Rule
A → target
1, then the
second rule is a direct parent of the first one, otherwise, it is a grandparent
of the first rule.
5. We use the notion
A → C
is a parent of
B → C
if
A ⊂ B
.If
|A|
=
|B|−
is a conjunction of conditions, to
represent the set of records that satisfy the condition (or set of conditions)
A
coverset
(
A
), where
A
. If a record
x
is in
coverset
(
A
), we say that
x
is covered by
A
.If
A
is
,
coverset
(
A
) includes all the records in the database.
6.
Coverage
(
A
) is the number of records covered by
A
.
coverage
(
A
)=
|coverset
(
A
)
|
.
3
Insignificant Exploratory Rules
As is mentioned before, exploratory rule discovery searches for multiple models
in a database, and may lead to discovering spurious or uninteresting rules. How
to decrease the number of resulting rules becomes a problem of concern. One
approach is up to the users to define a suitable set of constraints which may
be utilized so that the algorithm can automatically discard some potentially
uninteresting rules. Another approach is to perform comparison within resulting
rules, so as to present the users with a more compact set of models. Techniques
regarding automatically removing potentially uninteresting rules are summarized
by Huang and Webb [8].
3.1
Improvement
Filtering insignificant rules using statistical tests is one of the interesting
topics of research. By using this technique we perform significance tests among
rules and discard those which happen to appear interesting only by chance. To
 
Search WWH ::




Custom Search