Evaluation of Classification Trees - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

Hit-Rate [%]

100

80

60

40

500

400

300

200

100

Quota

Fig. 4.4

A typical hit curve.

positive instances in the entire dataset. Thus, the Qrecall for a quota of j

is defined as:

k =1

j

t [ k ]

Qrecall( j )=

.

(4.12)

n +

The denominator stands for the total number of instances that are

classified as positive in the entire dataset. Formally, it can be calculated as:

n + =

|{

<x i ,y i > : y i = pos

}|

.

(4.13)

4.2.6.4

Lift Curve

A popular method of evaluating probabilistic models is lift .Afteraranked

test set is divided into several portions (usually deciles), lift is calculated as

follows [ Coppock (2002) ] : the ratio of really positive instances in a specific

decile is divided by the average ratio of really positive instances in the

population. Regardless of how the test set is divided, a good model is

achieved if the lift decreases when proceeding to the bottom of the scoring

list. A good model would present a lift greater than 1 in the top deciles and

a lift smaller than 1 in the last deciles. Figure 4.5 illustrates a lift chart for

a typical model prediction. A comparison between models can be done by

comparing the lift of the top portions, depending on the resources available

and cost/benefit considerations.

4.2.6.5

Pearson Correlation Coecient

There are also some statistical measures that may be used as performance

evaluators of models. These measures are well known and can be found in

many statistical topics. In this section, we examine the Pearson correlation

Data Mining with Decision Trees: Theory and Applications

Search WWH ::

Custom Search

Home