Database Reference
In-Depth Information
y
,
where
cluset
is a set of itemsets from a cluster
Cluster
(
C)
t
,
y
is a class label,
y
cluset
→
Y
.
The support count of the
cluset
(called
clusupCount
) is the number of transactions in
C
that belong to the
cluset
. The support count of the
cluster_rule
(called
cisupCount
)
is the number of transactions in
D
that belong to the
cluset
and are labeled with class
y
. The
confidence
of a
cluster_rule
is (
cisupCount
/
clusupCount
)
∈
100%. The
support count of the
class y
(called
clasupCount
) is the number of transactions in
C
that belong to the class
y
. The
support
of a
class
(called
clasup
) is (
clasupCount
/ |
C
|)
×
×
100%, where |
C
| is the size of the dataset
C.
Given a
MP
, the
reliability
R can be defined as:
R(
cluset
→
y
) =
(
cisupCount
/
clusupCount
) - (
clasupCount
/ |
C
|)
×
100%
The traditional association rule mining only uses a single
minsup
in rule
generation, which is inadequate for many practical datasets with uneven class
frequency distributions. As a result, it may happen that the rules found for infrequent
classes are insufficient and too many may be found for frequent classes, inducing
useless or over-fitting rules, if the single
minsup
value is too high or too low. To
overcome this drawback, we apply the theory of mining with multiple minimum
supports
14
in the step of discovering the frequent MPs as following.
Suppose the total support is
t-minsup
, the different minimum class support for each
class
y
, denoted as
minsup
i
can be defined by the formula:
minsup
i
=
t-minsup
×
freqDistr(
y
)
where, freqDistr(
y
) is the function of class distributions.
Cluster_rules
that satisfy
minsup
i
are called
frequent cluster_rules
, while the rest are called
infrequent
cluster_rules
. If the
confidence
is greater than
minconf
, we say the
MP
is
accurate
.
The first step of
MOUCLAS-1
algorithm works in three sub-steps, by which the
problem of discovering a set of
MPs
is solved:
Algorithm:
Mining
frequent
and
accurate
and
reliable MOUCLAS
patterns (
MPs
)
Input:
A training transaction database,
D
; minimum support threshold (
minsup
i
);
minimum confidence threshold (
minconf
); minimum reliability threshold (
minR
)
Output:
A set of
frequent
,
accurate
and
reliable
MOUCLAS
patterns (
MPs
)
Methods:
(1)
Reduce the dimensionality of transactions
d
, which efficiently reduces the
data size by removing irrelevant or redundant attributes (or dimensions) from
the training data, and
(2)
Identify the clusters of database
C
for all transactions
d
after dimensionality
reduction on attributes
A
j
in database
C
, based on the Mountain function,
which is a fuzzy set membership function, and specially capable of
transforming quantitative values of attributes in transactions into linguistic
terms, and
(3)
Generate a set of
MPs
that are both
frequent
,
accurate
and
reliable
, namely,
which satisfy the user-specified minimum support (called
minsup
i
), minimum
confidence (called
minconf
) and minimum reliability (called
minR
)
constraints.
Search WWH ::
Custom Search