Continuous Risk Functionals - Minimum Error Entropy Classification

Information Technology Reference

In-Depth Information

Even if we relax the conditions on the desired probability density func-

tions, for instance, by choosing functions with no zeros on the Y support but

conveniently close to Dirac- δ functions, we may not yet reach the MEE con-

dition for classification because of (2.43): attaining the KL minimum for one

class conditional distribution, says nothing about the other class conditional

distribution and about H S .

2.3.4 The Quest for Minimum Entropy

We have presented some important properties of R MSE and R CE in Sect. 2.2.

We now discuss the properties of the R SEE risk functional for classification

problems. Restricting ourselves to the two-class setting with codomain re-

striction and T =

{−

1 , 1

}

, we rewrite (2.43) as

ln 1

f E|t ( e )

f E|t ( e ) de + H S ( T ) .

P ( t ) t +1

t− 1

H S ( E )=

t∈{− 1 , 1 }

(2.45)

We see that L EE t ( e )=

ln f E|t ( e ) are here the loss functions for the two

classes. The difference relative to L SE and L CE (and other conventional,

distance-like, loss functions) is that in this case the loss functions are ex-

pressed in terms of the unknown f E|t ( e ). Furthermore, in adaptive training

of a classifier f E|t ( e ) will change in unforeseeable ways. The same can be

said of Rényi's quadratic entropy, with gain function f E|t ( e ). Therefore, the

properties of the entropy risk functionals have to be analyzed not in terms

of loss functions but of the entropies themselves.

Although pattern recognition is a quest for minimum entropy [237], the

topic of entropy-minimizing distributions has only occasionally been studied,

namely in relation to finding optimal locations of PDFs in a mixture [115,

38] and applying the MinMax information measure to discrete distributions

[251]. Whereas entropy-maximizing distributions obeying given constraints

are well known, minimum entropy distributions on the real line are often

di cult to establish [125]. The only basic known result is that the minimum

entropy of unconstrained continuous densities corresponds to Dirac- δ combs

(sequences of Dirac- δ functions, including the single Dirac- δ function); for

discrete distributions the minimum entropy is zero and corresponds to a

single discrete Dirac- δ function.

Entropy magnitude is often thought to be associated with the magnitude

of the PDF tails, in the sense that larger tails imply larger entropy. (A PDF

f (

−

x>x 0 ,f ( x ) >

g ( x ); similarly, for left tail.) However, this presumption fails even in simple

cases of constrained densities: the unit-variance Gaussian PDF, g ( x ;0 , 1),has

) has larger right tail than PDF g (

) for positive x if

∃

x 0 ,

∀

smaller tails than the unit-variance bilateral-exponential PDF, e ( x ; √ 2) =

exp(

− √ 2

) / √ 2; however, the former has larger Shannon entropy, √ 2 πe =

Minimum Error Entropy Classification

Search WWH ::

Custom Search

Home