MEE with Continuous Errors - Minimum Error Entropy Classification

Information Technology Reference

In-Depth Information

3.3.2 Theoretical and Empirical MEE Behaviors

We start by analyzing simple settings with Gaussian inputs and then move

on to more realistic settings. The simple Gaussian-input settings provide the

basic insights on the distinct aspects of theoretical and empirical MEE, in

a theoretically controlled way. More realistic datasets serve to confirm those

insights.

3.3.2.1

Univariate and Bivariate Gaussian Datasets

Let us consider the perceptron with Gaussian inputs and the tanh activation

function. Applying Theorem 3.2, the class-conditional error densities are:

exp

( atanh ( t−e ) − ( w T

μ t + w 0 ) ) 2

−

w T Σ t w

f E|t ( e )=

2 π w T Σ t w e (2 t

] t− 1 ,t +1[ ( e ) .

(3.43)

−

e )

We first consider the univariate case ϕ ( w 1 x + w 0 ) with w 1 controlling the

steepness of the activation function; the error density is then

exp

( w 1 μ t + w 0 )) 2

w 1 σ t

(

atanh

( t

−

e )

−

√ 2 πw 1 σ t e (2 t

f E|t ( e )=

1 ,t +1[ ( e ) .

(3.44)

] t

−

e )

Even for this simple case there is no closed-form expression of H S (or H R 2 ).

One has to resort to numerical integration and apply expressions (C.3) (or

(C.5)). Setting w.l.o.g. ( μ − 1 ,σ − 1 )=(0 , 1) we obtain the H S behavior shown

in Fig. 3.14 [212].

Figure 3.14a corresponds to ( μ 1 ,σ 1 )=(3 , 1). The optimal split point

(the “decision border” in this case) is at x ∗ =1 . 5.Weobservethatforsmall

values of w 1 (top figure) H S exhibits a maximum at the optimal split point,

instead of a minimum. A minimum is obtained for a suciently large w 1

(bottom figure). The same behavior is observed in Fig. 3.14b corresponding

to ( μ 1 ,σ 1 )=(1 , 1) with x ∗ =0 . 5. This behavior is, in fact, general for

both H S and H R 2 , and no matter the degree of distribution overlap: the

theoretical MEE perceptron is able to produce the min P e solution.

We now move to the bivariate case, fixing

μ − 1 =[0 0] T , Σ t = I , and study

μ 1 =[1 0] T (close

classes). For these two settings the min P e value is 0.0062 and 0.3085, respec-

tively. These min P e values correspond to infinitely many optimal solutions

w ∗ =[ w 1 0 w 0 ] T :any( w 1 ,w 0 ) pair s.t.

μ 1 =[5 0] T

two different settings:

(distant classes) and

w 0 /w 1 =2 . 5 and

w 0 /w 1 =0 . 5,

−

respectively.

Minimum Error Entropy Classification

Search WWH ::

Custom Search

Home