MEE with Continuous Errors - Minimum Error Entropy Classification

Information Technology Reference

In-Depth Information

The asymptotic and generalization behaviors of Example 3.6 can be con-

firmed for other Gaussian datasets with equal and unequal covariances. There

is a theoretical justification for the good generalization of the MEE linear dis-

criminant with independent Gaussian inputs. It is based on the following

Theorem 3.1. The minimization of Shannon's or Rényi's quadratic entropy

of a weighted sum of d independent Gaussian distributions implies the mini-

mization of the norm of the weights.

Proof. The weighted sum of d independent Gaussian distributions y = w T x

has the PDF

f ( y )= g ( y ; w T

, w T Σw ) ,

(3.35)

with Σ a diagonal matrix of the variances since the distributions are inde-

pendent. But:

H S ( Y )=ln √ 2 πe w T Σw ;

(3.36)

H R 2 ( Y )=ln 2 √ π w T Σw . (3.37)

The quadratic form w T Σw canbewrittenas i =1 w i σ i ; therefore, the min-

imization of either H S or H R 2

2 .

implies the minimization of

Whenever the error PDF approaches a Gaussian distribution in the final

stages of the training process, Theorem 3.1 applies and we expect the min-

imization of

2 to take place. As is known from the theory of SVMs,

the minimization of

2 is desirable since it implies a smaller Vapnik-

Chervonenkis distance, therefore smaller classifier complexity with better

generalization [228, 43]. As a matter of fact, for Rényi's quadratic entropy

a stronger assertion can be made:

Corollary 3.1. The minimization of Rényi's quadratic entropy of the error

of a linear discriminant for independent Gaussian input distributions implies

the minimization of the norm of the weights.

Proof. We have: f Y |t ( y )= g ( y ; m t ,σ t ) with m t = w T

μ t + w 0 ,σ 2 = w T Σw ;

√ 2 πσ exp

m t ) 2 .

2 σ 2 ( t

f E|t ( e )= f Y |t ( t

−

e )=

−

(3.38)

Therefore:

1+exp

(2 + w T (

μ − 1 − μ 1 )) 2

4 σ 2

4 √ πσ

V R 2 ( E )=

−

(3.39)

which is an increasing function for a decreasing σ , thus, as we saw in Theorem

3.1, with decreasing w

2 .

Minimum Error Entropy Classification

Search WWH ::

Custom Search

Home