Continuous Risk Functionals - Minimum Error Entropy Classification

Information Technology Reference

In-Depth Information

ent at zero raises diculties to iterative optimization algorithms. As a matter

of fact, the large popularity of the MSE risk functional stems from the ex-

istence of ecient optimization algorithms, particularly those based on the

original adaptive training process known as the least-mean-square Widrow-

Hoff algorithm (see e.g., [142]).

2.1.2 The Cross-Entropy Risk

The cross-entropy (CE) loss function was first proposed (although without

naming it that way) in [22]; it can be derived from the maximum likeli-

hood (ML) method applied to the estimation of the posterior probabilities

P ( T k |

X . Each component y k of the classifier

output vector, assumed as taking value in [0 , 1],isviewedasanestimateof

the posterior probability P ( T k |

x ) ,k =1 , ..., c , for any x

∈

P ( T k |

x ); i.e., y k =

x ).

x ) simply by p k . The occurrence of a target vector

t conditioned on a given input vector x , in other words, a realization of the

r.v. T

Let us denote the P ( T k |

x , is governed by the joint distribution of ( T 1 |

x, ..., T c |

x ). For 0-1

coding the probability mass function of T

x is multinomial with

x )= p t 1 p t 2

...p t c

P ( T

(2.12)

Note that for c =2formula (2.12) reduces to a binomial distribution, e.g. of

T 1 ,as

x )= p t 1 (1

p 1 ) (1 −t 1 ) .

P ( T

−

(2.13)

Similarly, we assign a probabilistic model to the classifier outputs, by writing

x )= y t 1 y t 2 ...y t c , with y k = P ( Y k |

P ( Y

x ) ,

(2.14)

with the assumption that the outputs satisfy the same constraints as true

probabilities do, namely k P ( Y k |

x )=1.

We would like the Y

x distribution to approximate the target distribution

x . For this purpose we employ a loss function that maximizes the likelihood

of Y

x or, equivalently, minimizes the Kullback-Leibler (KL) divergence of

x (see Appendix A).

The empirical estimate of the KL divergence for i.i.d. random variables is

writteninthepresentcaseas:

x with respect to T

ln p t i 1

...p t ic

y )= 1

ln P ( T i |

x i )

D KL ( p

i 1

x i ) =

y t i 1

i 1 ...y t ic

P ( Y i |

i =1

t ik ln( y ik )+ 1

−

t ik ln( p ik ) .

(2.15)

i =1

k =1

i =1

k =1

Minimum Error Entropy Classification

Search WWH ::

Custom Search

Home