Continuous Risk Functionals - Minimum Error Entropy Classification

Information Technology Reference

In-Depth Information

Note that for simplicity of notation we write R L ( Y ) and R L ( Y ) instead of

R L ( Y w ) and R L ( Y w ). One must keep in mind that the minimization of both

risks is with respect to w

W . For a fixed family of output functions y w ,we

could as well denote the risk as R L ( w ).

We now proceed to surveying the two classic risk functionals that have

been almost exclusively used in learning algorithms, before introducing the

new risk functionals that are the keystones of the present topic.

∈

2.1 Classic Risk Functionals

2.1.1 The Mean-Square-Error Risk

The oldest and still the most popular, continuous and differentiable loss func-

tion, is the square-error (SE) function

L SE ( t ( x ) ,y ( x )) = ( t ( x ) − y ( x )) 2 ,

(2.3)

with corresponding risk functional

R MSE ( Y )=

P ( t )

y ( x )) 2 f X|t ( x ) dx .

( t ( x )

−

(2.4)

The empirical estimate of this functional is

R MSE ( Y )= 1

y i ) 2

( t i −

(2.5)

i =1

with t i = t ( x i ) and y i = y ( x i )

y w ( x i ).

The empirical risk expressed by formula (2.5) corresponds to the well-

known mean-square-error (MSE) method introduced by Gauss in the late

18 th century as a means of adjusting a function to a set of observations. In the

present context the observations are the t i and we try to fit the y w ( x i ) to the

t i , for a set of predictor values x i . Formula (2.5) expresses a penalization of the

deviations of y w ( x i ) from t i , according to a square law, therefore emphasizing

large deviations between observed and predicted values. The square law (2.3)

is a distance measure (for both sequences and functions), and is still the

praised measure in regression because of its several important properties and

its mathematical tractability.

Let us consider that, to some deterministic data generating process g ( X ),

some noise, ξ ( X ), is added: Z = g ( X )+ ξ ( X ). X and ξ are both random

variables. The minimum mean-square-error (MMSE) estimate Y = f ( X ) of

g ( X ) based on Z and the square-error measure, i.e., the min

≡

Y ) 2 ]

solution, turns out to be the conditional expectation of Z given X : Y =

[( Z

−

Minimum Error Entropy Classification

Search WWH ::

Custom Search

Home