Information Technology Reference
In-Depth Information
Chapter 2
Continuous Risk Functionals
As explained in the preceding chapter, the learning algorithm needed to ad-
equately tune a regression-like classifier, based on the information provided
by a training set, consists of the minimization of a quantity called risk, whose
expression is given by formula (1.7). This formula assigns a number, R L ( Y w ),
toafunction y w , i.e., the formula is an instantiation of an
y w }→ R
mapping. Such mapping type (from a set of functions onto a set of numbers)
is called a functional . The risk functional, expressed in terms of a contin-
uous and differentiable loss function L ( t ( x ) ,y w ( x )), is minimized by some
algorithm attempting to find a classifier with a probability of error hopefully
close to that of z w :min P e . From now on we assume that the class condi-
tional distributions are continuous 1 and, as a consequence, the risk functional
can be expressed as
R L ( Y )=
t∈T
Y W =
{
P ( t )
L ( t ( x ) ,y ( x )) f X|t ( x ) dx ,
(2.1)
X|t
where f X|t ( x ) is the class-conditional density (likelihood) of the data, P ( t )
is the prior probability of class (with target value) t ,and y
y w .Wehave
already pointed out in the preceding chapter that R L ( Y ) is an expected value:
E X,T [ L ( t ( X ) ,y ( X ))] =
t
R L ( Y )=
P ( t )
E X|t [ L ( t, y ( X ))] .
(2.2)
T
As also mentioned in Chap. 1, since the class-conditional densities are usually
unknown, the minimization carried out by learning algorithms is performed
on an empirical estimate of (2.1) (also known as Resubstitution estimate ),
R L ( Y ), expressed by formula (1.8). With mild conditions on L (measura-
bility), the empirical estimate
R L ( Y ) converges to R L ( Y ) almost surely as
n
→∞
.
1 In order to have a density the distribution must be absolutely continuous. How-
ever, since we will not consider datasets exhibiting exotic continuous singular
distributions, the continuity assumption suces.
 
Search WWH ::




Custom Search