Database Reference
In-Depth Information
probability of the association rule. Correspondingly, the greater R is, the stronger
MOUCLAS
patterns are, which means the occurrence of
Cluster
(
D)
t
more strongly
implies the occurrence of
y
. Therefore, we can utilize reliability to further prune the
selected
frequent and accurate and reliable
MOUCLAS
patterns (
MPs
) to identify the
truly interesting
MPs
and make the discovered
MPs
more understandable. The
MP
satisfying minimum reliability is
reliable
, where
MP
has reliability defined by the
above formula.
Given a set of transactions,
D
, the problems of
De-MP
are to discover
MPs
that
have support and confidence greater than the user-specified minimum support
threshold (called
minsup
)
13
, and minimum confidence threshold (called
minconf
)
13
and minimum reliability threshold (called
minR
) respectively, and to construct a
classifier based upon
MPs
.
A Jumping
MOUCLAS
Pattern (
JMP
) can be further defined based on the notion of
the Jumping Emerging Pattern
6
(
JEP
) and
MP
. A
JEP
is an itemset whose support
increases significantly from 0 in a class (say poisonous class in mushroom data from
the UCI repository) to a user-specified value in another class (say edible class). We
can then use
JEP
as an index for dimensionality reduction. For each
JEP
in a certain
class
y
, only the attributes of the
JEP
will be kept for all the transactions in the class
y
. We then perform the clustering on those transactions.
Let
C
denote the dataset of transaction
d
labeled with class
y
after dimensionality
reduction processing by
JEPs
. A
JMP
can be defined as a
cluster_rule
, namely a
rule:
y
,
where
cluset
is a set of itemsets from a cluster
Cluster
(
C)
t
, which is obtained from the
clustering on the same class of transactions after dimensionality reduction via JEP,
y
is a class label,
y
cluset
→
∈
Y
. Let
JMPset
denote a set of
JMPs
which coresponds to the same
JEP
.
Suppose the number of transactions of C in
cluset
is
cluCount
, the number of
tansactions in
C
is
clasCount
, the
support
of transaction
d
belong to
cluset
in
C,
denoted as
subsup
, can be defined by the formula:
cluCount
subsup
=
clasCount
Given a set of transactions,
D
, the problems of
J-MP
is to discover all
JMP
s and
calculate their
subsup
and construct a classifier based upon
JMPs
.
3 The
MOUCLAS-1
Algorithm
The classification technique,
MOUCLAS-1
, consists of two steps:
1.
Discovery of
frequent
,
accurate
and
reliable
MPs
.
2.
Construction of a classifier, called
De-MP
, based on
MPs
.
The core of the first step in the
MOUCLAS-1
algorithm is to find all
cluster_rules
that have support above
minsup
. Let
C
denote the dataset
D
after dimensionality
reduction processing. A
cluster_rule
represents a
MP
, namely a rule:
Search WWH ::
Custom Search