Database Reference
In-Depth Information
Eciently Identifying Exploratory Rules'
Significance
Shiying Huang and Geoffrey I. Webb
School of Computer Science and Software Engineering,
Monash University, Melbourne VIC 3800, Australia
{ Shiying.Huang, Geoff.Webb } @infotech.monash.edu.au
Abstract. How to e ciently discard potentially uninteresting rules in
exploratory rule discovery is one of the important research foci in data
mining. Many researchers have presented algorithms to automatically
remove potentially uninteresting rules utilizing background knowledge
and user-specified constraints. Identifying the significance of exploratory
rules using a significance test is desirable for removing rules that may
appear interesting by chance, hence providing the users with a more com-
pact set of resulting rules. However, applying statistical tests to identify
significant rules requires considerable computation and data access in
order to obtain the necessary statistics. The situation gets worse as the
size of the database increases. In this paper, we propose two approaches
for improving the e ciency of significant exploratory rule discovery. We
also evaluate the experimental effect in impact rule discovery which is
suitable for discovering exploratory rules in very large, dense databases.
Keywords: Exploratory rule discovery, impact rule, rule significance,
interestingness measure.
1
Introduction
Exploratory rule discovery techniques seek multiple models which are able to ef-
ficiently describe the potentially interesting inter-relationships among attributes
in a database. Searching for multiple models instead of a single model often
results in numerous spurious or uninteresting rules.
How to automatically discard statistically insignificant rules has been an im-
portant issue in research of exploratory rule discovery. Several papers have been
devoted to this topic. Bay and Pazzani [4], Liu et. al [10] and Webb [15], devel-
oped techniques for identifying insignificant rules with qualitative attributes only
(or descretized quantitative attributes). Aumann and Lindell [2] and Huang and
Webb [8] both did research on exploratory rule significance with undescretized
quantitative attributes as consequent.
When filtering insignificant exploratory rules regarding quantitative
attributes, the rule discovery systems have to go through the database several
times so as to collect the necessary parameters for the significance test. More-
over, considerable CPU time has to be spent on data access and looking for
Search WWH ::




Custom Search