Applications - Minimum Error Entropy Classification - page 143

Information Technology Reference

In-Depth Information

15

c=2

10

h

5

0

0

50

100

150

200

250

300

350

400

450

500

15

c=3

10

h

5

0

0

50

100

150

200

250

300

350

400

450

500

15

c=4

10

h

5

0

0

50

100

150

200

250

300

350

400

450

500

n

Fig. 6.6 Value of h opt (dashed line) and h fat (solid line) for c =2,3,and4.

(Marked instances refer to experiments with datasets summarized in Table 6.2; see

text.)

Figure 6.6 shows the h values obtained in these experiments (Table 6.2),

with open circles for the "Best h " and crosses for the "Suggested h ". The

h fat values proposed by formula (6.8) closely follow these values.

One could think of making h an updated variable along the training pro-

cess, namely proportional to the error variance. This is suggested by the fact

that, as one approaches the optimal solution, the errors tend to concentrate

making sense to decrease h for the corresponding smaller data range. How-

ever, it turns out to be a tricky task to devise an empirical updating formula

which does not lead to instability in the vicinity of the minimum error, given

the wild variations that a small h value originates in the gradient calculations.

Figure 6.7 shows an MLP training error curve for the PB12 dataset [110] us-

ing an updated h , which is simply proportional to the error variance. Also

shown is the value of h itself. Near 100 epochs and with training error 10%

the algorithm becomes unstable loosing the capability to converge.

Next Page

Minimum Error Entropy Classification

Search WWH ::

Custom Search

Home