Database Reference
In-Depth Information
who may be carrying dangerous instruments (such as scissors, penknives
and shaving blades). For this purpose the ocer is using a classifier that is
capable of classifying each passenger either as class A, which means, “Carry
dangerous instruments” or as class B, “Safe”.
Suppose that searching a passenger is a time-consuming task and that
the security ocer is capable of checking only 20 passengers prior to each
flight. If the classifier has labeled exactly 20 passengers as class A, then the
ocer will check all these passengers. However, if the classifier has labeled
more than 20 passengers as class A, then the ocer is required to decide
which class A passenger should be ignored. On the other hand, if less than
20 people were classified as A, the ocer, who must work constantly, has
to decide who to check from those classified as B after he has finished with
the class A passengers.
There are also cases in which a quota limitation is known to exist but its
size is not known in advance. Nevertheless, the decision maker would like
to evaluate the expected performance of the classifier. Such cases occur,
for example, in some countries regarding the number of undergraduate
students that can be accepted to a certain department in a state university.
The actual quota for a given year is set according to different parameters
including governmental budget. In this case, the decision maker would like
to evaluate several classifiers for selecting the applicants while not knowing
the actual quota size. Finding the most appropriate classifier in advance
is important because the chosen classifier can dictate what the important
attributes are, i.e. the information that the applicant should provide the
registration and admission unit.
In probabilistic classifiers, the above-mentioned definitions of precision
and recall can be extended and defined as a function of a probability
threshold τ . If we evaluate a classifier based on a given test set which
consists of n instances denoted as ( <x 1 ,y 1 >,...,<x n ,y n > ) such that x i
represents the input features vector of instance i and y i represents its true
class (“positive” or “negative”), then:
<x i ,y i > : P DT ( pos
Precision ( τ )= |{
|
x i ) >τ,y i = pos
}|
,
(4.8)
P DT ( pos
|{
<x i ,y i > :
|
x i )
|
<x i ,y i > : P DT ( pos
Recall ( τ )= |{
|
x i ) >τ,y i = pos
}|
,
(4.9)
|{
<x i ,y i > : y i = pos
}|
where DT represents a probabilistic classifier that is used to estimate the
conditional likelihood of an observation x i to “positive” which is denoted as
Search WWH ::




Custom Search