Evaluation of Classification Trees - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

4.2.6.2

Hit-Rate Curve

The hit-rate curve presents the hit ratio as a function of the quota size. Hit-

rate is calculated by counting the actual positive labeled instances inside

a determined quota [ An and Wang (2001) ] . More precisely, for a quota of

size j and a ranked set of instances, hit-rate is defined as:

k =1

j

t [ k ]

Hit-Rate( j )=

,

(4.10)

j

where t [ k ] represents the truly expected outcome of the instance located

in the k 'th position when the instances are sorted according to their

conditional probability for “positive” by descending order. Note that if the

k 'th position can be uniquely defined (i.e. there is exactly one instance that

can be located in this position) then t [ k ] is either 0 or 1 depending on the

actual outcome of this specific instance. Nevertheless, if the k 'th position

is not uniquely defined and there are m k, 1 instances that can be located in

this position, and m k, 2 of which are truly positive, then:

t [ k ] = m k, 2 / m k, 1

.

(4.11)

The sum of t [ k ] over the entire test set is equal to the num-

ber of instances that are labeled “positive”. Moreover, Hit - Rate ( j ) ≈

P recision ( p [ j ] )where p [ j ] denotes the j 'th order of

P I ( pos

|

x 1 ) ,...,

P I ( pos

x m ). The values are strictly equal when the value of j 'th is uniquely

defined.

It should be noted that the hit-rate measure was originally defined

without any reference to the uniqueness of a certain position. However, there

are some classifiers that tend to provide the same conditional probability

to several different instances. For instance, in a decision tree, any instances

in the test set that belongs to the same leaf get the same conditional

probability. Thus, the proposed correction is required in those cases.

Figure 4.4 illustrates a hit-curve.

|

4.2.6.3

Qrecall ( Quota Recall )

Thehit-ratemeasure,presentedabove, is the “precision” equivalent for

quota-limited problems. Similarly, we suggest the Qrecall (for quota recall)

to be the “recall” equivalent for quota-limited problems. The Qrecall for

a certain position in a ranked list is calculated by dividing the number of

positive instances, from the head of the list until that position, by the total

Data Mining with Decision Trees: Theory and Applications

Search WWH ::

Custom Search

Home