Database Reference
In-Depth Information
Table 4.4
Characteristics of Qrecall and Hit-rate.
Parameter
Hit-rate
Qrecall
Function
increasing/decreasing
Non-monotonic
Monotonically increasing
End point
Proportion of positive
samples in the set
1
Sensitivity of the
measures value to
positive instances
Very sensitive to positive
instances at the top of
the list. Less sensitive
on going down to the
bottom of the list.
Same sensitivity to
positive instances in all
places in the list.
Effect of negative class on
the measure
A negative instance
affects the measure and
causeitsvalueto
decrease.
A negative instance does
not affect the measure.
Range
0 Hit-rate 1
0 Qrecall 1
random guess, without any learning) is a linear line (or semi-linear because
values are discrete) which starts at 0 (for zero quota size) and ends in 1.
Suppose now that a model gave an optimum prediction, meaning that
all positive instances are located at the head of the list and below them, all
the negative instances. In this case, the Qrecall curve climbs linearly until
a value of 1 is achieved at point, n + ( n + = number of positive samples).
From that point, any quota that has a size bigger than n + , fully extracts
test set potential and the value 1 is kept until the end of the list.
Note that a “good model”, which outperforms random classification,
though not an optimum one, will fall “on average” between these two curves.
It may drop sometimes below the random curve but generally, more area is
delineated between the “good model” curve and the random curve, above
the latter than below it. If the opposite is true then the model is a “bad
model” that does worse than a random guess.
The last observation leads us to consider a measure that evaluates
the performance of a model by summing the areas delineated between the
Qrecall curve of the examined model and the Qrecall curve of a random
model (which is linear). Areas above the linear curve are added and areas
below the linear curve are subtracted. The areas themselves are calculated
by subtracting the Qrecall of a random classification from the Qrecall of the
model's classification in every point as shown in Figure 4.7. The areas where
the model performed better than a random guess increase the measure's
value while the areas where the model performed worse than a random guess
decrease it. If the last total computed area is divided in the area delineated
Search WWH ::




Custom Search