EE-Inspired Risks - Minimum Error Entropy Classification - page 124

Information Technology Reference

In-Depth Information

Example 5.4. Consider two Gaussian distributed class conditional PDFs with

μ − 1 =[0 0] T ;

μ 1 =[3 0] T ; Σ − 1 = Σ 1 = I .

(5.19)

Independent training and test sets are generated with n = 200 instances (100

per class) and R ZED is used with h =1and h =3.Aninitial η =0 . 001 is

used. Figs. 5.3 and 5.4 show the final converged solution after 80 epochs of

training: the final decision border and error PDF estimate; R ZED and the

training and test misclassification rates along the training process. We see

that an increased h provokes a need of more initial epochs till R ZED starts to

increase significantly. This is explained by looking to formula (5.17): a higher

h implies a lower ∂ f E (0) /∂w k and consequently smaller gradient ascent steps

(the adaptive η then compensates the influence of the higher h ). On the other

hand, a better generalization is obtained with h =3.Infact,whenusinga

lower h the perceptron provides a better discrimination on the training set

(look to the decision borders of both figures) but with an increased test set

error. The use of a higher h provides an oversmoothed estimate of

R ZED ,

masking local off-optimal solutions.

0.4

x 2

^

E (e)

2

0.3

1

0

0.2

−1

0.1

−2

x 1

e

−3

0

−2

0

2

4

−2

−1

0

1

2

Error Rate (Test) = 0.070

0.4

0.8

^

Error Rate

ZED

0.35

0.6

0.3

0.4

0.25

0.2

epochs

epochs

0.2

0

0

20

40

60

80

0

20

40

60

80

Fig. 5.3

The final converged solution of Example 5.4 with h =1 .

Next Page

Minimum Error Entropy Classification

Search WWH ::

Custom Search

Home