Mining MOUCLAS Patterns and Jumping MOUCLAS Patterns to Construct Classifiers - Data Mining: Theory, Methodology, Techniques, and Applications

Database Reference

In-Depth Information

y ,

where cluset is a set of itemsets from a cluster Cluster ( C) t , y is a class label, y

cluset

→

Y .

The support count of the cluset (called clusupCount ) is the number of transactions in

C that belong to the cluset . The support count of the cluster_rule (called cisupCount )

is the number of transactions in D that belong to the cluset and are labeled with class

y . The confidence of a cluster_rule is ( cisupCount / clusupCount )

∈

100%. The

support count of the class y (called clasupCount ) is the number of transactions in C

that belong to the class y . The support of a class (called clasup ) is ( clasupCount / | C |)

100%, where | C | is the size of the dataset C.

Given a MP , the reliability R can be defined as:

R( cluset

→

y ) =

( cisupCount / clusupCount ) - ( clasupCount / | C |)

100%

The traditional association rule mining only uses a single minsup in rule

generation, which is inadequate for many practical datasets with uneven class

frequency distributions. As a result, it may happen that the rules found for infrequent

classes are insufficient and too many may be found for frequent classes, inducing

useless or over-fitting rules, if the single minsup value is too high or too low. To

overcome this drawback, we apply the theory of mining with multiple minimum

supports 14 in the step of discovering the frequent MPs as following.

Suppose the total support is t-minsup , the different minimum class support for each

class y , denoted as minsup i can be defined by the formula:

minsup i = t-minsup

freqDistr( y )

where, freqDistr( y ) is the function of class distributions. Cluster_rules that satisfy

minsup i are called frequent cluster_rules , while the rest are called infrequent

cluster_rules . If the confidence is greater than minconf , we say the MP is

accurate .

The first step of MOUCLAS-1 algorithm works in three sub-steps, by which the

problem of discovering a set of MPs is solved:

Algorithm: Mining frequent and accurate and reliable MOUCLAS patterns ( MPs )

Input: A training transaction database, D ; minimum support threshold ( minsup i );

minimum confidence threshold ( minconf ); minimum reliability threshold ( minR )

Output: A set of frequent , accurate and reliable MOUCLAS patterns ( MPs )

Methods:

(1) Reduce the dimensionality of transactions d , which efficiently reduces the

data size by removing irrelevant or redundant attributes (or dimensions) from

the training data, and

(2) Identify the clusters of database C for all transactions d after dimensionality

reduction on attributes A j in database C , based on the Mountain function,

which is a fuzzy set membership function, and specially capable of

transforming quantitative values of attributes in transactions into linguistic

terms, and

(3) Generate a set of MPs that are both frequent , accurate and reliable , namely,

which satisfy the user-specified minimum support (called minsup i ), minimum

confidence (called minconf ) and minimum reliability (called minR )

constraints.

Data Mining: Theory, Methodology, Techniques, and Applications

Search WWH ::

Custom Search

Home