Information Technology Reference
In-Depth Information
Table 4.9.
Tests performed on
Zoo
database
Threshold well classif.
¬
well classif. 50%
0.1
98.46%
1.53%
PAT
0.2
100%
0%
0.3
96%
4%
C4.5
78.46%
21.53%
OAT
88.73%
11.26%
Table 4.10.
The confusion matrix of the
Zoo
database using
PAT and C4.5
a b c d e f g ¡- classified as
7000000a=mammal
0 600001b=b rd
1020100c= ep e
000 0000d=fish
0000300e= mphibian
0100030f=in t
0000006g=i eb e
abcdef g - ified s
7000000a=mammal
61 00000b=b rd
1020001c= ep e
1009000d=fish
1001001e= mphibian
1000030f=in t
2000103g=i eb e
Table 4.11.
The confusion matrix of the
Zoo
database using
OAT
abcdef g - ified s
5020000—a=mamm l
0 330100—b=b rd
0020200—c= ep e
000 0000—d=fish
0000300—e=amphib an
0000040—f=in t
0000006—g=in eb e
In our experiment, we have also calculated the
Root Mean Squared Error
7
which is a metric for comparing the accuracy of probability estimates [4].
Table 4.12 shows RMSE for each method. Since RMSE is a measure of error,
smaller is better. RMSE for Both C4.5 and
OAT
are bigger than the RMSE for
PAT
.
7
The root mean squared error for an instance
x
is given by the following equation:
j
=
n
1
n
RMSE
=
(
t
(
j|x
)
− P
(
j|x
))
2
(4.2)
j
=1
where
x
is the instance,
j
is the class value,
t
(
j|x
)is the true probability of class
j
for
x
and
P
(
j|x
) is the probability estimated by the method for instance
x
and class
j
. For test data where the true classes are known, but not probabilities,
t
(
j|x
)is
defined to be 1 if the class of
x
is j and 0 otherwise.