Dealing with Missing Values in a Probabilistic Decision Tree during Classification - Mining Complex Data

Information Technology Reference

In-Depth Information

Table 4.9. Tests performed on Zoo database

Threshold well classif.

¬ well classif. 50%

0.1

98.46%

1.53%

PAT 0.2

100%

0%

0.3

96%

4%

C4.5

78.46%

21.53%

OAT

88.73%

11.26%

Table 4.10. The confusion matrix of the Zoo database using PAT and C4.5

a b c d e f g ¡- classified as

7000000a=mammal

0 600001b=b rd

1020100c= ep e

000 0000d=fish

0000300e= mphibian

0100030f=in t

0000006g=i eb e

abcdef g - ified s

7000000a=mammal

61 00000b=b rd

1020001c= ep e

1009000d=fish

1001001e= mphibian

1000030f=in t

2000103g=i eb e

Table 4.11. The confusion matrix of the Zoo database using OAT

abcdef g - ified s

5020000—a=mamm l

0 330100—b=b rd

0020200—c= ep e

000 0000—d=fish

0000300—e=amphib an

0000040—f=in t

0000006—g=in eb e

In our experiment, we have also calculated the Root Mean Squared Error 7

which is a metric for comparing the accuracy of probability estimates [4].

Table 4.12 shows RMSE for each method. Since RMSE is a measure of error,

smaller is better. RMSE for Both C4.5 and OAT are bigger than the RMSE for

PAT .

7 The root mean squared error for an instance x is given by the following equation:

j = n

1

n

RMSE =

( t ( j|x ) − P ( j|x )) 2

(4.2)

j =1

where x is the instance, j is the class value, t ( j|x )is the true probability of class j

for x and P ( j|x ) is the probability estimated by the method for instance x and class

j . For test data where the true classes are known, but not probabilities, t ( j|x )is

defined to be 1 if the class of x is j and 0 otherwise.

Mining Complex Data

Search WWH ::

Custom Search

Home