Database Reference
In-Depth Information
y ,
where cluset is a set of itemsets from a cluster Cluster ( C) t , y is a class label, y
cluset
Y .
The support count of the cluset (called clusupCount ) is the number of transactions in
C that belong to the cluset . The support count of the cluster_rule (called cisupCount )
is the number of transactions in D that belong to the cluset and are labeled with class
y . The confidence of a cluster_rule is ( cisupCount / clusupCount )
100%. The
support count of the class y (called clasupCount ) is the number of transactions in C
that belong to the class y . The support of a class (called clasup ) is ( clasupCount / | C |)
×
×
100%, where | C | is the size of the dataset C.
Given a MP , the reliability R can be defined as:
R( cluset
y ) =
( cisupCount / clusupCount ) - ( clasupCount / | C |)
×
100%
The traditional association rule mining only uses a single minsup in rule
generation, which is inadequate for many practical datasets with uneven class
frequency distributions. As a result, it may happen that the rules found for infrequent
classes are insufficient and too many may be found for frequent classes, inducing
useless or over-fitting rules, if the single minsup value is too high or too low. To
overcome this drawback, we apply the theory of mining with multiple minimum
supports 14 in the step of discovering the frequent MPs as following.
Suppose the total support is t-minsup , the different minimum class support for each
class y , denoted as minsup i can be defined by the formula:
minsup i = t-minsup
×
freqDistr( y )
where, freqDistr( y ) is the function of class distributions. Cluster_rules that satisfy
minsup i are called frequent cluster_rules , while the rest are called infrequent
cluster_rules . If the confidence is greater than minconf , we say the MP is
accurate .
The first step of MOUCLAS-1 algorithm works in three sub-steps, by which the
problem of discovering a set of MPs is solved:
Algorithm: Mining frequent and accurate and reliable MOUCLAS patterns ( MPs )
Input: A training transaction database, D ; minimum support threshold ( minsup i );
minimum confidence threshold ( minconf ); minimum reliability threshold ( minR )
Output: A set of frequent , accurate and reliable MOUCLAS patterns ( MPs )
Methods:
(1) Reduce the dimensionality of transactions d , which efficiently reduces the
data size by removing irrelevant or redundant attributes (or dimensions) from
the training data, and
(2) Identify the clusters of database C for all transactions d after dimensionality
reduction on attributes A j in database C , based on the Mountain function,
which is a fuzzy set membership function, and specially capable of
transforming quantitative values of attributes in transactions into linguistic
terms, and
(3) Generate a set of MPs that are both frequent , accurate and reliable , namely,
which satisfy the user-specified minimum support (called minsup i ), minimum
confidence (called minconf ) and minimum reliability (called minR )
constraints.
 
Search WWH ::




Custom Search