MEE with Continuous Errors - Minimum Error Entropy Classification

Information Technology Reference

In-Depth Information

1 class corresponds to the right cluster),

the evolution of the decision border, and the error PDF (computed only at the

e i values) at successive iteration steps, known as epochs, in one experiment.

Epochs =0is the initial configuration with random weights and bias. Note

the evolution of f ( e ) towards a Gaussian-resembling PDF.

The solution at Epochs =50is practically the same as it was at Epochs =

35 and doesn't change thereafter; it is a convergent solution. The entropy

and error rate graphs also become flat after Epochs =35.Weseethatthis

convergent solution is somewhat deviated from the optimal solution (the

vertical line at x 1 =1) because of insucient bias adjustment in the last

iterations.

The min P e value is exactly known for the two-class setting with equal-

covariance Gaussian inputs, separated by a linear discriminant. The min P e

is then also the Bayes optimal error given by [76]

Figure 3.3 shows the dataset (the

−

P e Bayes =1

−

Φ ( δ/ 2) ,

(3.34)

with Φ (

·

) the standardized normal CDF and δ 2 =(

μ − 1 − μ 1 ) T Σ − 1 (

μ − 1 − μ 1 ),

the Mahalanobis distance. In the present case P e Bayes =0 . 1587.

We see that the final solution exhibits a training set and test set error of

around 0 . 23. However, at epochs =28a solution closer to the optimal one,

with training set error P ed =0 . 166 and test set error P et =0 . 164 had been

reached; progress from this stage was hindered by the insensitivity to w 0 .

Example 3.4. This example is similar to the preceding one; the only difference

is that we now use h =0 . 4, as predicted by the optimal IMSE formula (E.19).

The experiment illustrated in Fig. 3.4 clearly shows the convergence to two

Dirac- δ functions near 1 and

−

1, as in Example 3.1 when an initial small σ

was used. This should not be surprising since σ is here √ w T w , implying a

convergence of the weights to zero, therefore to an error PDF represented by

two Dirac- δ functions at

1+ w 0 , with the bias w 0 representing here the same

role as parameter d in Example 3.1. As a consequence we obtain a behavior

already described in Sect. 3.1.3 with a final

±

P ed =

P et =0 . 5.Notethatwith

zero weights the decision border is undefined.

Minimum Error Entropy Classification

Search WWH ::

Custom Search

Home