Mining Efficiently Significant Classification Association Rules - Data Mining: Foundations and Practice

Databases Reference

In-Depth Information

Mining E ciently Significant Classification

Association Rules

Yanb o J. Wang 1 , Qin Xin 2 , and Frans Coenen 1

1

Department of Computer Science, University of Liverpool, Ashton Building,

Ashton Street, Liverpool, L69 3BX, UK

jwang@csc.liv.ac.uk, frans@csc.liv.ac.uk

2

Department of Informatics, University of Bergen, P.B.7800,

N-5020 Bergen, Norway

xin@ii.uib.no

Summary. Classification Rule Mining (CRM) is a well-known Data Mining tech-

nique for the extraction of hidden Classification Rules (CRs) from a given database

that is coupled with a set of pre-defined classes, the objective being to build a clas-

sifier to classify “unseen” data-records. One recent approach to CRM is to employ

Association Rule Mining (ARM) techniques to identify the desired CRs, i.e. Clas-

sification Association Rule Mining (CARM). Although the advantages of accuracy

and e ciency offered by CARM have been established in many papers, one major

drawback is the large number of Classification Association Rules (CARs) that may

be generated - up to a maximum of “2 n

− n − 1” in the worst case, where n repre-

sents the number of data-attributes in a database. However, there are only a limited

number, say at most k in each class, of CARs that are required to distinguish be-

tween classes. The problem addressed in this chapter is how to e ciently identify

the k such CARs. Having a CAR list that is generated from a given database, based

on the well-established “Support-Confidence” framework, a rule weighting scheme is

proposed in this chapter, which assigns a score to a CAR that evaluates how signifi-

cantly this CAR contributes to a single pre-defined class. Consequently a rule mining

approach is presented, that addresses the above, that operates in time O ( k 2 n 2 )inits

deterministic fashion, and O ( kn ) in its randomised fashion, where k represents the

number of CARs in each class that are potentially significant to distinguish between

classes and k ≥

k ; as opposed to exponential time O (2 n ) - the time required in

score computation to mine all k CARs in a “one-by-one” manner. The experimental

results show good performance regarding the accuracy of classification when using

the proposed rule weighting scheme with a suggested rule ordering mechanism, and

evidence that the proposed rule mining approach performs well with respect to the

e ciency of computation.

Search WWH ::

Custom Search

Home