Information Technology Reference
In-Depth Information
Note that for simplicity of notation we write R L ( Y ) and R L ( Y ) instead of
R L ( Y w ) and R L ( Y w ). One must keep in mind that the minimization of both
risks is with respect to w
W . For a fixed family of output functions y w ,we
could as well denote the risk as R L ( w ).
We now proceed to surveying the two classic risk functionals that have
been almost exclusively used in learning algorithms, before introducing the
new risk functionals that are the keystones of the present topic.
2.1 Classic Risk Functionals
2.1.1 The Mean-Square-Error Risk
The oldest and still the most popular, continuous and differentiable loss func-
tion, is the square-error (SE) function
L SE ( t ( x ) ,y ( x )) = ( t ( x ) − y ( x )) 2 ,
(2.3)
with corresponding risk functional
R MSE ( Y )=
T
P ( t )
y ( x )) 2 f X|t ( x ) dx .
( t ( x )
(2.4)
X
The empirical estimate of this functional is
n
R MSE ( Y )= 1
n
y i ) 2
( t i
(2.5)
i =1
with t i = t ( x i ) and y i = y ( x i )
y w ( x i ).
The empirical risk expressed by formula (2.5) corresponds to the well-
known mean-square-error (MSE) method introduced by Gauss in the late
18 th century as a means of adjusting a function to a set of observations. In the
present context the observations are the t i and we try to fit the y w ( x i ) to the
t i , for a set of predictor values x i . Formula (2.5) expresses a penalization of the
deviations of y w ( x i ) from t i , according to a square law, therefore emphasizing
large deviations between observed and predicted values. The square law (2.3)
is a distance measure (for both sequences and functions), and is still the
praised measure in regression because of its several important properties and
its mathematical tractability.
Let us consider that, to some deterministic data generating process g ( X ),
some noise, ξ ( X ), is added: Z = g ( X )+ ξ ( X ). X and ξ are both random
variables. The minimum mean-square-error (MMSE) estimate Y = f ( X ) of
g ( X ) based on Z and the square-error measure, i.e., the min
Y ) 2 ]
solution, turns out to be the conditional expectation of Z given X : Y =
E
[( Z
 
Search WWH ::




Custom Search