Continuous Risk Functionals - Minimum Error Entropy Classification

Information Technology Reference

In-Depth Information

X ] (see e.g., [136]). This is the usual regression solution of Z predicted

by X . One of the reasons why the MMSE estimate Y is so praised in regression

problems, is that it is the optimal one — affords the minimum

[ Z

Y )]

for a class of convex, symmetric, and unimodal loss functions — when g ( X )

is linear and X and ξ are Gaussian [208, 88]. Furthermore, when the noise is

independent of X and has zero mean, the conditional expectation factors out

as Y =

[ L ( Z

−

[ g ( X )

X ]+

[ ξ ( X )

X ]= g ( X ). One is then able to retrieve g ( X )

from Z .

For classification problems the MMSE solution also enjoys important prop-

erties. Instead of deriving these properties from the regression setting (apply-

ing the above Z = g ( X )+ ξ ( X ) model to classification raises mathematical

diculties), they can be derived [83, 185, 26, 252] by first observing that the

empirical MSE risk,

R MSE , for a classifier with c target values t k and outputs

y k is written as

n k

R MSE =

y k ( x i )) 2 ,

( t ik −

(2.6)

k =1

i =1

where n k is the number of instances of class ω k and each y k depends on the

parameter vector w .For n

→∞

, and after some mathematical manipulations,

one obtains:

R MSE

( E [ T k |x ] − y k ( x )) 2 f X|t ( x ) dx +

→

n→∞

R MSE =

X|T

k =1

x ] f X|t ( x ) dx .

[ T k |

2 [ T k |

x ]

− E

(2.7)

X|T

k =1

The second term of (2.7) represents a variance of the t k and does not depend

on parameter tuning. Thus, the minimization of R MSE for n

implies the

minimization of the first term of (2.7). In optimal conditions (to be mentioned

shortly), that amounts to obtaining

→∞

y k ( x )=

[ T k |

x ] .

(2.8)

This result is the version for the classification setting of the general result

( Y =

X ]) previously mentioned for the regression setting. Expression

(2.8) can be written out in detail as:

[ Z

y k ( x )=

[ T k |

x ]=

t i P ( T k = t i |

x );

(2.9)

i =1

implying, for a 0-1 coding scheme of the t i ,

y k ( x )= P ( T k |

x ) .

(2.10)

Minimum Error Entropy Classification

Search WWH ::

Custom Search

Home