Databases Reference
In-Depth Information
Mining E ciently Significant Classification
Association Rules
Yanb o J. Wang 1 , Qin Xin 2 , and Frans Coenen 1
1
Department of Computer Science, University of Liverpool, Ashton Building,
Ashton Street, Liverpool, L69 3BX, UK
jwang@csc.liv.ac.uk, frans@csc.liv.ac.uk
2
Department of Informatics, University of Bergen, P.B.7800,
N-5020 Bergen, Norway
xin@ii.uib.no
Summary. Classification Rule Mining (CRM) is a well-known Data Mining tech-
nique for the extraction of hidden Classification Rules (CRs) from a given database
that is coupled with a set of pre-defined classes, the objective being to build a clas-
sifier to classify “unseen” data-records. One recent approach to CRM is to employ
Association Rule Mining (ARM) techniques to identify the desired CRs, i.e. Clas-
sification Association Rule Mining (CARM). Although the advantages of accuracy
and e ciency offered by CARM have been established in many papers, one major
drawback is the large number of Classification Association Rules (CARs) that may
be generated - up to a maximum of “2 n
− n − 1” in the worst case, where n repre-
sents the number of data-attributes in a database. However, there are only a limited
number, say at most k in each class, of CARs that are required to distinguish be-
tween classes. The problem addressed in this chapter is how to e ciently identify
the k such CARs. Having a CAR list that is generated from a given database, based
on the well-established “Support-Confidence” framework, a rule weighting scheme is
proposed in this chapter, which assigns a score to a CAR that evaluates how signifi-
cantly this CAR contributes to a single pre-defined class. Consequently a rule mining
approach is presented, that addresses the above, that operates in time O ( k 2 n 2 )inits
deterministic fashion, and O ( kn ) in its randomised fashion, where k represents the
number of CARs in each class that are potentially significant to distinguish between
classes and k ≥
k ; as opposed to exponential time O (2 n ) - the time required in
score computation to mine all k CARs in a “one-by-one” manner. The experimental
results show good performance regarding the accuracy of classification when using
the proposed rule weighting scheme with a suggested rule ordering mechanism, and
evidence that the proposed rule mining approach performs well with respect to the
e ciency of computation.
Search WWH ::




Custom Search