Databases Reference
In-Depth Information
d Ei
D E , by summarising a common pre-defined class from its K
most similar instances, identified in D R . To identify the K most sim-
ilar training-instances for d Ei , calculating the Euclidean distance value
between each training data-record d Ri
D R and d Ei has been commonly
used: Distance ( d Ri ,d Ei )= ( j =1 ( d Ri j
d Ei j ) 2 ), where d Ri j
and d Ei j
are the values of the j th data-attribute in D C for d Ri and d Ei .
Support Vector Machine: The objective of using Support Vector Machine
(SVM) [6] is to find a hypothesis h which minimises the true error defined
as the probability that h produces an erroneous result. SVM make use of
linear functions of the form: f ( x )= w T x + b ,where w is the weight vector,
x is the input vector, and w T x is the inner product between w and x .The
main concept of SVM is to select a hyperplane that separates the positive
and negative examples while maximising the smallest margin. Standard
SVM techniques produce binary classifiers. Two common approaches to
support the application of SVM techniques to the multi-class problem are
One Against All (OAA) and One Against One (OAO).
2.3 Classification Association Rule Mining
An overlap between ARM and CRM is CARM, which strategically solves
the traditional CRM problem by applying ARM techniques. It mines a set of
CARs from D TR .ACARisanARoftheform X
c i ,where X is an FI mined
from D TR ,and c i is a pre-defined class in C to which data-records can be
assigned. The idea of CARM was first presented in [3]. Subsequently a number
of alternative approaches have been described. Broadly CARM algorithms
can be categorised into two groups according to the way that the CRs are
generated:
Two stage algorithms where a set of CARs are produced first (stage 1),
which are then pruned and placed into a classifier (stage 2). Examples
of this approach include CBA [38] and CMAR [36]. CBA (Classification
Based on Associations), developed by Liu et al. in 1998, is an Apriori [2]
based CARM algorithm, which (1) applies its CBA-GR procedure for CAR
generation; and (2) applies its CBA-CB procedure to build a classifier
based on the generated CARs. CMAR (Classification based on Multiple
Association Rules), introduced by Han and Jan in 2001, is similar to CBA
but generates CARs through a FP-tree [27] based approach.
Integrated algorithms where the classifier is produced in a single processing
step. Examples of this approach include TFPC 2 [15,18], and induction sys-
tems such as FOIL [46], PRM and CPAR [53]. TFPC (Total From Partial
Classification), proposed by Coenen et al. in 2004, is a Apriori-TFP [16]
based CARM algorithm, which generates CARs through e ciently con-
structing both P-tree and T-tree set enumeration tree structures. FOIL
2 TFPC may be obtained from http://www.csc.liv.ac.uk/frans/KDD/Software.
 
Search WWH ::




Custom Search