Database Reference
In-Depth Information
Hit-Rate [%]
100
80
60
40
500
400
300
200
100
Quota
Fig. 4.4
A typical hit curve.
positive instances in the entire dataset. Thus, the Qrecall for a quota of j
is defined as:
k =1
j
t [ k ]
Qrecall( j )=
.
(4.12)
n +
The denominator stands for the total number of instances that are
classified as positive in the entire dataset. Formally, it can be calculated as:
n + =
|{
<x i ,y i > : y i = pos
}|
.
(4.13)
4.2.6.4
Lift Curve
A popular method of evaluating probabilistic models is lift .Afteraranked
test set is divided into several portions (usually deciles), lift is calculated as
follows [ Coppock (2002) ] : the ratio of really positive instances in a specific
decile is divided by the average ratio of really positive instances in the
population. Regardless of how the test set is divided, a good model is
achieved if the lift decreases when proceeding to the bottom of the scoring
list. A good model would present a lift greater than 1 in the top deciles and
a lift smaller than 1 in the last deciles. Figure 4.5 illustrates a lift chart for
a typical model prediction. A comparison between models can be done by
comparing the lift of the top portions, depending on the resources available
and cost/benefit considerations.
4.2.6.5
Pearson Correlation Coecient
There are also some statistical measures that may be used as performance
evaluators of models. These measures are well known and can be found in
many statistical topics. In this section, we examine the Pearson correlation
Search WWH ::




Custom Search