MEE with Continuous Errors - Minimum Error Entropy Classification

Information Technology Reference

In-Depth Information

Substituting (3.55) and (3.56) in (3.53) we finally obtain the expression (3.46)

of V R 2 ( E ).

Remarks:

1. Note that the supports of f E|− 1 ( e ) and f E| 1 ( e ) are disjoint; respectively,

]

π, 0[ and ]0 ,π [. This is a consequence of using a sigmoidal activation

function, and contrasts to what happened with the linear discriminant.

2. The expression of V R 2 ( E ) has terms m t containing the bias w 0 . Therefore,

and also contrasting with the linear discriminant, the MEE algorithm is

now always able to adjust the bias of the decision borders. The same

remark applies to EE for formulas (3.44) and (3.45).

−

It is a well-known fact that, for two-class problems with Gaussian inputs

having equal covariance matrix Σ , the Bayes linear decision function for

equal priors [182, 76] is

2 (

μ 1 )] T Σ − 1 (

d ( x )=[ x −

μ − 1 +

μ − 1 − μ 1 );

(3.57)

in other words, the linear discriminant d ( x )=0passes through the point

lying half-way of the means — 2 (

μ 1 ) — and is orthogonal to Σ − 1 (

μ − 1 −

μ 1 ). We now analyze how the arctan perceptron behaves relatively to this

issue in the next theorem.

μ − 1 +

Theorem 3.3. The Bayes linear discriminant for equal-prior two-class prob-

lems, with Gaussian inputs having the same covariance Σ , is a critical point

of Rényi's quadratic entropy of the arctan perceptron.

Proof. We first note that, given a two-class problem with Gaussian inputs

S having the same covariance Σ , one can always transform it into an equiv-

alent problem with covariance I , by applying the whitening transformation

to the inputs (see e.g. [76]): x =( ΦΛ − 1 / 2 ) T s ,where s is a (vector) instance

of the multi-dimensional r.v. S , Φ and Λ are the eigenvector and eigen-

value matrices of Σ , respectively, and x is the corresponding instance of the

whitened multi-dimensional r.v. X . Since the whitening transformation is a

linear transformation, the position of the critical points of a continuous func-

tion of S , like the one implemented by the arctan perceptron, is also linearly

transformed by ( ΦΛ − 1 / 2 ) T . For the whitened inputs the Bayes discriminant

is orthogonal to the vector linking the means

μ − 1 − μ 1 .

We then apply the previous Lemma to the whitened inputs problem, and

proceed to studying the critical points of V ( w ,w 0 )

8 √ πV R 2 ( E ; w ,w 0 ):

≡

V ( w ,w 0 )=

σ (1 + m t )+ σ

σ (1 + m 2

− 1 + m 1 )+ σ, (3.58)

≡

with σ = √ w T Iw =

. Let us take the partial derivatives. For the bias

term:

Minimum Error Entropy Classification

Search WWH ::

Custom Search

Home