Information Technology Reference
In-Depth Information
Chapter 2
Continuous Risk Functionals
As explained in the preceding chapter, the learning algorithm needed to ad-
equately tune a regression-like classifier, based on the information provided
by a training set, consists of the minimization of a quantity called risk, whose
expression is given by formula (1.7). This formula assigns a number,
R
L
(
Y
w
),
toafunction
y
w
, i.e., the formula is an instantiation of an
y
w
}→
R
mapping. Such mapping type (from a set of functions onto a set of numbers)
is called a
functional
. The risk functional, expressed in terms of a
contin-
uous and differentiable
loss function
L
(
t
(
x
)
,y
w
(
x
)), is minimized by some
algorithm attempting to find a classifier with a probability of error hopefully
close to that of
z
w
∗
:min
P
e
. From now on we assume that the class condi-
tional distributions are continuous
1
and, as a consequence, the risk functional
can be expressed as
R
L
(
Y
)=
t∈T
Y
W
=
{
P
(
t
)
L
(
t
(
x
)
,y
(
x
))
f
X|t
(
x
)
dx ,
(2.1)
X|t
where
f
X|t
(
x
) is the class-conditional density (likelihood) of the data,
P
(
t
)
is the prior probability of class (with target value)
t
,and
y
y
w
.Wehave
already pointed out in the preceding chapter that
R
L
(
Y
) is an expected value:
≡
E
X,T
[
L
(
t
(
X
)
,y
(
X
))] =
t
R
L
(
Y
)=
P
(
t
)
E
X|t
[
L
(
t, y
(
X
))]
.
(2.2)
∈
T
As also mentioned in Chap. 1, since the class-conditional densities are usually
unknown, the minimization carried out by learning algorithms is performed
on an empirical estimate of (2.1) (also known as
Resubstitution estimate
),
R
L
(
Y
), expressed by formula (1.8). With mild conditions on
L
(measura-
bility), the empirical estimate
R
L
(
Y
) converges to
R
L
(
Y
) almost surely as
n
→∞
.
1
In order to have a density the distribution must be absolutely continuous. How-
ever, since we will not consider datasets exhibiting exotic continuous singular
distributions, the continuity assumption suces.