Mining Efficiently Significant Classification Association Rules - Data Mining: Foundations and Practice

Databases Reference

In-Depth Information

d Ei

D E , by summarising a common pre-defined class from its K

most similar instances, identified in D R . To identify the K most sim-

ilar training-instances for d Ei , calculating the Euclidean distance value

between each training data-record d Ri ∈

∈

D R and d Ei has been commonly

used: Distance ( d Ri ,d Ei )= ( j =1 ( d Ri j −

d Ei j ) 2 ), where d Ri j

and d Ei j

are the values of the j th data-attribute in D C for d Ri and d Ei .

•

Support Vector Machine: The objective of using Support Vector Machine

(SVM) [6] is to find a hypothesis h which minimises the true error defined

as the probability that h produces an erroneous result. SVM make use of

linear functions of the form: f ( x )= w T x + b ,where w is the weight vector,

x is the input vector, and w T x is the inner product between w and x .The

main concept of SVM is to select a hyperplane that separates the positive

and negative examples while maximising the smallest margin. Standard

SVM techniques produce binary classifiers. Two common approaches to

support the application of SVM techniques to the multi-class problem are

One Against All (OAA) and One Against One (OAO).

2.3 Classification Association Rule Mining

An overlap between ARM and CRM is CARM, which strategically solves

the traditional CRM problem by applying ARM techniques. It mines a set of

CARs from D TR .ACARisanARoftheform X

c i ,where X is an FI mined

from D TR ,and c i is a pre-defined class in C to which data-records can be

assigned. The idea of CARM was first presented in [3]. Subsequently a number

of alternative approaches have been described. Broadly CARM algorithms

can be categorised into two groups according to the way that the CRs are

generated:

⇒

•

Two stage algorithms where a set of CARs are produced first (stage 1),

which are then pruned and placed into a classifier (stage 2). Examples

of this approach include CBA [38] and CMAR [36]. CBA (Classification

Based on Associations), developed by Liu et al. in 1998, is an Apriori [2]

based CARM algorithm, which (1) applies its CBA-GR procedure for CAR

generation; and (2) applies its CBA-CB procedure to build a classifier

based on the generated CARs. CMAR (Classification based on Multiple

Association Rules), introduced by Han and Jan in 2001, is similar to CBA

but generates CARs through a FP-tree [27] based approach.

•

Integrated algorithms where the classifier is produced in a single processing

step. Examples of this approach include TFPC 2 [15,18], and induction sys-

tems such as FOIL [46], PRM and CPAR [53]. TFPC (Total From Partial

Classification), proposed by Coenen et al. in 2004, is a Apriori-TFP [16]

based CARM algorithm, which generates CARs through e ciently con-

structing both P-tree and T-tree set enumeration tree structures. FOIL

2 TFPC may be obtained from http://www.csc.liv.ac.uk/frans/KDD/Software.

Data Mining: Foundations and Practice

Search WWH ::

Custom Search

Home