Database Reference
In-Depth Information
Therefore, all the above research issues establish a challenge, which is whether it
is possible that an association rule based classifier with any number of predicates in
the antecedent can be developed for quantitative attributes by the concepts of
clustering which can overcome the limitation caused by the discretization method.
In this paper, to resolve the problem, we present a new approach to the
classification over quantitative data in high dimensional databases, called
MOUCLAS (MOUntain function based CLASsification), based on the concept of
the fuzzy set membership function. It aims at integrating the advantages of
classification, clustering and association rules mining to identify interesting patterns
in selected sample data sets.
2 Problem Statement
We now give a formal statement of the problem of MOUCLAS Patterns (called MPs )
and introduce some definitions.
The MOUCLAS algorithm, similar to ARCS, assumes that the initial association
rules can be agglomerated into clustering regions, while obeying the anti-monotone
rule constraint. Our proposed framework assumes that the training dataset D is a
normal relational set, where transaction d
D . Each transaction d is described by
attributes A j , j = 1 to l . The dimension of D is l , the number of attributes used in D .
This allows us to describe a database in terms of volume and dimension. D can be
classified into a set of known classes Y , y
Y . The value of an attribute must be
quantitative. In this work, we treat all the attributes uniformly. We can treat a
transaction as a set of (attributes, value) pairs and a class label. We call each
(attribute, value) pair an item. A set of items is simply called an itemset.
In this paper, we propose two novel classifiers, called the De-MP and J-MP , which
exploit the discriminationg ability of MOUCLAS Patterns ( MPs ) and Jumping
MOUCLAS Patterns ( JMPs ).
The MOUCLAS Pattern (so called MP ) has an implication of the form:
Cluster ( D) t
y ,
where Cluster ( D) t is a cluster of D, t = 1 to m , and y is a class label.
The definitions of frequency and accuracy of MOUCLAS Patterns are defined as
following: The MP satisfying minimum support is frequent , where MP has support s
if s% of the transactions in D belong to Cluster ( D) t and are labeled with class y . The
MP that satisfies a pre-specified minimum confidence is called accurate , where MP
has confidence c if c% of the transactions belonging to Cluster(D) t are labeled with
class y .
We also adopt the concept of reliability 12 to describe the correlation. The measure
of reliability of the association rule A B can be defined as:
P
(
A
B
)
()
reliability R(A B) =
P
B
P
(
A
)
Since R is the difference between the conditional probability of B given A and the
unconditional of B, it measures the effect of available information of A on the
Search WWH ::




Custom Search