Evaluation of Classification Trees - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

Table 4.5

An example for calculating PEM for instances of Table 4.2.

Place

Success

Model

Random

Optimal

t [ k ]

in list

probability

Qrecall

0.45

0 . 25

0 . 1

0 . 25

0 . 15

0.34

0 . 25

0 . 2

0 . 5

0 . 05

0 . 3

0 . 5

0 . 3

0 . 75

0 . 2

0 . 45

0.32

0 . 75

0 . 4

0 . 35

0 . 6

0.26

0.15

0 . 75

0 . 5

0 . 25

0 . 5

0.14

0 . 75

0 . 6

0 . 15

0 . 4

0.09

0 . 7

0 . 3

0.07

0.06

0.03

Tota l

1 . 75

where n − denotes the number of instances that are actually classified as

“negative”. Table 4.5 illustrates the calculation of PEM for the instances

in Table 4.2. Note that the random Qrecall does not represent a certain

realization but the expected values. The optimal qrecall is calculated as if

the “positive” instances have been located in the top of the list.

Note that the PEM somewhat resembles the Gini index produced from

Lorentz curves which appear in economics when dealing with the distribu-

tion of income. Indeed, this measure indicates the difference between the

distribution of positive samples in a prediction and the uniform distribution.

Note also that this measure gives an indication of the total lift of the model

at every point. In every quota size, the difference between the Qrecall of the

model and the Qrecall of a random model expresses the lift in extracting

the potential of the test set due to the use in the model (for good or for

bad).

4.2.7

Which Decision Tree Classifier is Better?

Below we discuss some of the most common statistical methods proposed

[ Dietterich (1998) ] for answering the following question: Given two inducers

AandBandadataset S , which inducer will produce more accurate

classifiers when trained on datasets of the same size?

4.2.7.1

McNemar's Test

Let S be the available set of data, which is divided into a training set R

and a test set T . Then we consider two inducers A and B trained on the

Data Mining with Decision Trees: Theory and Applications

Search WWH ::

Custom Search

Home