Database Reference
In-Depth Information
the set of records which is covered by the antecedent of a rule. For example, it
has been shown by Huang and Webb [8] that the time spent for discovering the
top 1000 significant impact rules is on the whole much more than that spent on
discovering the top 1000 impact rules without using any filter, especially when
most of the top 1000 impact rules are insignificant. A technique for improving
the eciency of the insignificance filter is presented in the same paper by in-
troducing the triviality filter. The anti-monotonicity of triviality was utilized to
effectively prune the search space.
There is an immediate need for improving the eciency of the insignificance
filter for distributional-consequent exploratory rule discovery, even after the in-
troduction of the triviality filter. In this paper, we propose two approaches for
eciency improving in exploratory rule discovery, which can result in substan-
tial reduction of the computation for discovering significant rules. Although the
demonstration is done on impact rule discovery, these techniques can also be
recast for other exploratory rule discovery tasks.
The paper is organized as follows: In section 2, we introduce the concept
and notations of exploratory rule discovery. Existing techniques for discarding
insignificant exploratory rules are introduced in section 3, followed by the brief
description of impact rule discovery in section 4. The techniques for improving
the eciency are presented in section 5. In section 6, we provide experimental
results and evaluations. Conclusions are drawn in section 7.
2
Exploratory Rule Discovery
Traditional machine learning systems discover a single model from the available
data that is expected to maximize the accuracy or some other specific measures
of performance on unknown future data. Predictions or classifications are then
done on the basis of this single model [15]. Examples include the decision tree
[12], the decision rules [11], and the Naive-Bayes classifier. However, alternative
models exist that perform equally well as those which are selected by the systems.
Thus, it is not always sensible to choose only one of the“best” models in some
cases. The criteria for deciding whether a model is best or not also varies with
the context of application. Exploratory rule discovery techniques are proposed
to overcome this problem by searching for multiple models which satisfy certain
constraints and presenting all these models to the user. Thus, the users are
provided with alternative choices. Better flexibility is achieved herewith.
Exploratory rule discovery techniques [8] are classified into propositional rule
discovery which seeks rules with qualitative attributes or discretized quantitative
attributes only and distributional-consequent rule discovery which seeks rules
with quantitative attributes as consequent. The status of performance such
quantitative attributes are described with their distributions. Association rule
discovery [1], contrast sets discovery [4] are examples of propositional exploratory
rule discovery, while impact rule discovery [13] and quantitative association
rule discovery [2] both belong to the class of distributional-consequent rule dis-
covery. It is argued that distributional-consequent rules are able to provide better
 
Search WWH ::




Custom Search