Mining Efficiently Significant Classification Association Rules - Data Mining: Foundations and Practice

Databases Reference

In-Depth Information

R

is presented using CSA ordering, and N represents the size of

R

that can

be as large as “2 n

1”, a rule weighting scheme is proposed in this chapter,

which assigns a score to a CAR R j ∈R

−

n

−

that represents how significantly R j

contributes to a single class c i ∈

C . An alternative rule ordering mechanism is

consequently introduced, based on the proposed rule weighting scheme, that

aims to improve the performance of the well-established CSA ordering regard-

ing the accuracy of classification. In [19] a general framework for selectors,

namely ( k,m,n )-selectors, was proposed with applications in optimal group

testing. In this chapter, a similar concept of selectors is further considered in

a randomised setting. This randomised selector can be proved to exist with a

high probability. With regards to the concept of selectors, a novel rule mining

approach is presented that addresses the problem of mining the k “significant

rules” (see Definition 9 in Sect. 3.2) in

. The rule mining approach operates

in time O ( k 2 n 2 ) in its deterministic fashion, and O ( kn ) in its randomised fash-

ion, where k represents the number of CARs in each class that can potentially

be used to distinguish between classes and k

R

k ; as opposed to exponential

time O (2 n ) - the time required in score computation to find all k significant

(the “best k ”) rules in

≥

in a “one-by-one” manner. The experimental results

show that the proposed rule weighting and rule ordering approaches perform

well regarding the accuracy of classification, and evidence the fast computa-

tional e ciency of running the randomised rule mining approach. Note that

the deterministic rule mining approach is theoretical only.

R

1.2 Chapter Organisation

The following section describes some Data Mining work that relate to CARM.

In Sect. 3 we first introduce the rule weighting scheme together with a rule

ordering mechanism based on the rule weighting scheme; and sketch the con-

cept of deterministic selectors and give an introduction to the concept of

randomised selectors. In Sect. 4 we propose a rule mining approach to e -

ciently mine the “best

k ”CARsin

. In Sect. 5 we present experimental

results obtained using the TFPC (Total From Partial Classification) CARM

algorithm [15, 18] coupled with a “best first” case satisfaction approach. Fi-

nally we discuss our conclusions in Sect. 6, and a number of open issues for

further research.

R

2 Related Work

2.1 Association Rule Mining

ARM extracts a set of ARs from D T , first introduced in [1]. Let I =

{

a 1 ,a 2 ,...,a n− 1 ,a n }

be a set of items (data-attributes), and

T

=

{

T 1 ,T 2 ,...,

T m− 1 ,T m }

be a set of transactions (data-records), D T is described by

T

,where

Data Mining: Foundations and Practice

Search WWH ::

Custom Search

Home