Database Reference
In-Depth Information
In the first sub-step, the
MPs
are sorted in descending order, which has the training
transactions surely covered by the
MPs
with the highest precedence when possible in
the next sub-step. The sort of the whole set of
MPs
is performed following the
definition of
precedence
:
Given two
MPs
, we say that
MP
a
has a higher precedence than
MP
b
, denoted as
MP
a
≻
MP
b
,
if
MPs
, it holds that: the confidence of
MP
a
is greater than that of
MP
b
, or if their confidences are the same, but the support of
MP
a
is greater than that
of
MP
b
, or if both the confidences and supports of
MP
a
and
MP
b
are the same, but
MP
a
is generated earlier than
MP
b
.
In the second sub-step, we test the
MPs
following decreasing precedence and stop
the sub-step when there is no rule or no training transaction. For each
MP
, we scan
C
to find those transactions satisfying the cluset of the
MP
. If the
MP
can correctly
classify one transaction, we store it in a set denoted as
L
. Those transactions satisfying
the cluset of the
MP
will be removed from
C
at each pass. Each transaction can be
identified by a unique ID. The next pass will be performed on the remaining data. A
default class is defined at each scan, which is the majority class in the remaining data.
At the end of each pass, the total number of errors that are made by the current
L
and
the default class are also stored. When there is no rule or no training transaction left,
we terminate this sub-step. After this sub-step, every
MP
in
L can
correctly classify at
least one training transaction in
C
.
In the third sub-step, though we would like to find as many
MPs
as possible to give
good coverage of the training transactions in the second sub-step, we prefer strong
MPs
which have relatively high support and confidence, due to their characteristics of
corresponding to larger coverage and stronger differentiating power. Meanwhile, we
hope that the
De-MP
classifier, consisting of a combination of strong
MPs
, has a
∀
MP
a
,
MP
b
∈
1
R
= sort(
R
); // sort MPs based on their precedence
2
for
each
MP
R
in sequence
do
3
temp
= ;
4
for
each transaction
d
C
do
5
if
d
satisfies the cluset of
MP
then
6 store
d
.ID in
temp
;
7
if
MP
correctly classifies
d
then
8 insert
MP
at the end of
L
;
9 delete the transaction who has ID in
temp
from
C
;
10
selecting a default class for the current
L
; // determine the default class based on majority class of
remaining transactions in
C
11
end
12 compute the total number of errors of
L
; // compute the total number of errors that are made by the
current
L
and the default class
13
end
14 Find the first MP in
L
with the lowest total number of errors and discard all the
MPs
after the
MP
in
L
;
15 Add the default class associated with the above mentioned first MP to end of
L
;
16
De-MP
classifier =
L
Fig. 2.
The Second Step of the MOUCLAS Algorithm
Search WWH ::
Custom Search