Database Reference
In-Depth Information
probability of the association rule. Correspondingly, the greater R is, the stronger
MOUCLAS patterns are, which means the occurrence of Cluster ( D) t more strongly
implies the occurrence of y . Therefore, we can utilize reliability to further prune the
selected frequent and accurate and reliable MOUCLAS patterns ( MPs ) to identify the
truly interesting MPs and make the discovered MPs more understandable. The MP
satisfying minimum reliability is reliable , where MP has reliability defined by the
above formula.
Given a set of transactions, D , the problems of De-MP are to discover MPs that
have support and confidence greater than the user-specified minimum support
threshold (called minsup ) 13 , and minimum confidence threshold (called minconf ) 13
and minimum reliability threshold (called minR ) respectively, and to construct a
classifier based upon MPs .
A Jumping MOUCLAS Pattern ( JMP ) can be further defined based on the notion of
the Jumping Emerging Pattern 6 ( JEP ) and MP . A JEP is an itemset whose support
increases significantly from 0 in a class (say poisonous class in mushroom data from
the UCI repository) to a user-specified value in another class (say edible class). We
can then use JEP as an index for dimensionality reduction. For each JEP in a certain
class y , only the attributes of the JEP will be kept for all the transactions in the class
y . We then perform the clustering on those transactions.
Let C denote the dataset of transaction d labeled with class y after dimensionality
reduction processing by JEPs . A JMP can be defined as a cluster_rule , namely a
rule:
y ,
where cluset is a set of itemsets from a cluster Cluster ( C) t , which is obtained from the
clustering on the same class of transactions after dimensionality reduction via JEP, y
is a class label, y
cluset
Y . Let JMPset denote a set of JMPs which coresponds to the same
JEP .
Suppose the number of transactions of C in cluset is cluCount , the number of
tansactions in C is clasCount , the support of transaction d belong to cluset in C,
denoted as subsup , can be defined by the formula:
cluCount
subsup =
clasCount
Given a set of transactions, D , the problems of J-MP is to discover all JMP s and
calculate their subsup and construct a classifier based upon JMPs .
3 The MOUCLAS-1 Algorithm
The classification technique, MOUCLAS-1 , consists of two steps:
1. Discovery of frequent , accurate and reliable MPs .
2. Construction of a classifier, called De-MP , based on MPs .
The core of the first step in the MOUCLAS-1 algorithm is to find all cluster_rules
that have support above minsup . Let C denote the dataset D after dimensionality
reduction processing. A cluster_rule represents a MP , namely a rule:
Search WWH ::




Custom Search