Continuous Risk Functionals - Minimum Error Entropy Classification

Information Technology Reference

In-Depth Information

P ( t ) 1

− 1

R L ( Y )=

t∈T

E Y |t [ L ( t, Y )] =

t∈{− 1 , 1 }

P ( t )

L ( t, y ) f Y |t ( y ) dy , (2.25)

if the absolute integrability condition (2.23) for L ( t, y ) is satisfied.

Applying again Theorem 2.1 to E = T

Y , the risk functional (2.25) is

finally expressed in terms of the error variable as

−

P ( t ) t +1

t− 1

R L ( E )=

t∈{− 1 , 1 }

L ( t, e ) f E|t ( e ) de .

(2.26)

y ) 2

= e 2

For MSE, L SE ( t, e )=( t

−

depends only on e (or, in more detail,

E T,E [ E 2 ], the second order moment

of the error, which is empirically estimated as in (2.5) and can be rewritten

e w = t

−

y w ). We then have R MSE ( E )=

y i ) 2 .

y i ) 2 +

t i =1

R MSE ( Y )= 1

( t i −

(2.27)

t i = − 1

Let us now consider the cross-entropy risk whose empirical estimate is given

by formula (2.16). For a two-class problem and the

-coding scheme one

obtains the following popularized expression, when the classifier has a single

output:

{

0 , 1

}

R CE ( Y )=

−

t i )ln(1

−

y i )

−

t i ln( y i ) .

(2.28)

t i =0

t i =1

The

{−

1 , 1

}

-coding implies a y

→

( y +1) / 2 transformation; formula (2.28)

is then rewritten as

ln 1

ln 1+ y i

−

y i

R CE ( Y )= −

−

(2.29)

t i = − 1

t i =1

When multiplied by n , R CE ( Y ) can be viewed as the empirical estimate of

the following (theoretical) risk functional:

R CE ( Y )= − P ( − 1) 1

−

ln(1 − y ) f Y |− 1 ( y ) dy−

P (1) 1

− 1

−

ln(1 + y ) f Y | 1 ( y ) dy +ln(2) .

(2.30)

Applying the same variable transformation as we did before, the CE risk

functional is finally expressed in terms of the error variable as

P ( t ) t +1

t− 1

ln 1

f E|t ( e ) de +ln(2) .

R CE ( E )=

t∈{− 1 , 1 }

(2.31)

−

Minimum Error Entropy Classification

Search WWH ::

Custom Search

Home