Efficiently Identifying Exploratory Rules’ Significance - Data Mining: Theory, Methodology, Techniques, and Applications

Database Reference

In-Depth Information

Eciently Identifying Exploratory Rules'

Significance

Shiying Huang and Geoffrey I. Webb

School of Computer Science and Software Engineering,

Monash University, Melbourne VIC 3800, Australia

{ Shiying.Huang, Geoff.Webb } @infotech.monash.edu.au

Abstract. How to e ciently discard potentially uninteresting rules in

exploratory rule discovery is one of the important research foci in data

mining. Many researchers have presented algorithms to automatically

remove potentially uninteresting rules utilizing background knowledge

and user-specified constraints. Identifying the significance of exploratory

rules using a significance test is desirable for removing rules that may

appear interesting by chance, hence providing the users with a more com-

pact set of resulting rules. However, applying statistical tests to identify

significant rules requires considerable computation and data access in

order to obtain the necessary statistics. The situation gets worse as the

size of the database increases. In this paper, we propose two approaches

for improving the e ciency of significant exploratory rule discovery. We

also evaluate the experimental effect in impact rule discovery which is

suitable for discovering exploratory rules in very large, dense databases.

Keywords: Exploratory rule discovery, impact rule, rule significance,

interestingness measure.

1

Introduction

Exploratory rule discovery techniques seek multiple models which are able to ef-

ficiently describe the potentially interesting inter-relationships among attributes

in a database. Searching for multiple models instead of a single model often

results in numerous spurious or uninteresting rules.

How to automatically discard statistically insignificant rules has been an im-

portant issue in research of exploratory rule discovery. Several papers have been

devoted to this topic. Bay and Pazzani [4], Liu et. al [10] and Webb [15], devel-

oped techniques for identifying insignificant rules with qualitative attributes only

(or descretized quantitative attributes). Aumann and Lindell [2] and Huang and

Webb [8] both did research on exploratory rule significance with undescretized

quantitative attributes as consequent.

When filtering insignificant exploratory rules regarding quantitative

attributes, the rule discovery systems have to go through the database several

times so as to collect the necessary parameters for the significance test. More-

over, considerable CPU time has to be spent on data access and looking for

Search WWH ::

Custom Search

Home