MEE with Discrete Errors - Minimum Error Entropy Classification

Information Technology Reference

In-Depth Information

Tabl e 4 . 1 Percentage of test error and standard deviations (in parentheses) ob-

tained with SEE for the simulated Gaussian data. Minimization was used for dis-

tances d =3 and d =1 . 5 , and maximization for d =1 .

n d

d Bayes error n t

200

2000

20000

10 2 6.79(2.41)

6.75(2.51)

10 4 6.82(0.83)

3

6.68%

6.70(0.81)

6.66(0.81)

10 6 6.81(0.20)

6.69(0.08)

6.68(0.08)

10 2 25.23(4.65) 24.67(4.58) 22.61(4.21)

10 4 25.32(2.49) 24.72(2.15) 22.80(0.46)

10 6 25.46(2.54) 24.83(2.21) 22.82(0.24)

1.5

22.66 %

10 2 30.63(4.64) 30.90(4.48) 30.70(4.82)

10 4 30.93(0.47) 30.87(0.47) 30.84(0.46)

10 6 30.93(0.17) 30.86(0.14) 30.85(0.14)

1

30.85%

Tabl e 4 . 2 Percentage of test error and standard deviation (in parentheses) ob-

tained with maximization of SEE with increased h ( h =2 . 27 )for d =1 . 5 .

n d

d Bayes error n t 200 2000 20000

10 2 22.95(3.93) 22.78(4.01) 22.47(4.14)

10 4 22.73(0.41) 22.65(0.43) 22.66(0.41)

10 6 22.75(0.17) 22.67(0.14) 22.67(0.13)

1.5

22.66%

Table 4.1 shows the mean values and standard deviations over 1000 rep-

etitions for the test error of each experiment, using h =1 . 7, 0 . 1 and 0 . 8 for

d =1, 1 . 5 and 3, respectively. When the amount of available data is huge,

SEE achieves Bayes discrimination as expected. For small datasets the pic-

ture is quite different. SEE still finds a good solution for d =3and d =1,

but performs poorly for d =1 . 5, which is near the t value for Gaussian classes,

that is, in the limbo between a choice to minimize or maximize. The reason

lies in the highly non smooth estimate of the input distributions by the KDE

method, with h<h IMSE . This can be solved by using fat estimation of the

input PDFs as we did for d =1and d =3. As shown in Fig. 4.7, when h is

too small one gets a non-smooth entropy function while for large h the over-

smoothed input PDF estimates provide a smooth entropy curve preserving

the maximum. The results of Table 4.2 were obtained by using fat estimation

( h =2 . 27)tothecase d =1 . 5. Now, SEE performs similarly as in the cases

d =1and d =3.

The explanation for this behavior relies on the increased variance of the

estimated PDFs, which for a Gaussian kernel is given by σ = s 2 + h 2

(see

Minimum Error Entropy Classification

Search WWH ::

Custom Search

Home