Information Technology Reference
In-Depth Information
Tabl e 4 . 1 Percentage of test error and standard deviations (in parentheses) ob-
tained with SEE for the simulated Gaussian data. Minimization was used for dis-
tances
d
=3
and
d
=1
.
5
, and maximization for
d
=1
.
n
d
d
Bayes error
n
t
200
2000
20000
10
2
6.79(2.41)
6.75(2.51)
6.75(2.51)
10
4
6.82(0.83)
3
6.68%
6.70(0.81)
6.66(0.81)
10
6
6.81(0.20)
6.69(0.08)
6.68(0.08)
10
2
25.23(4.65) 24.67(4.58) 22.61(4.21)
10
4
25.32(2.49) 24.72(2.15) 22.80(0.46)
10
6
25.46(2.54) 24.83(2.21) 22.82(0.24)
1.5
22.66 %
10
2
30.63(4.64) 30.90(4.48) 30.70(4.82)
10
4
30.93(0.47) 30.87(0.47) 30.84(0.46)
10
6
30.93(0.17) 30.86(0.14) 30.85(0.14)
1
30.85%
Tabl e 4 . 2 Percentage of test error and standard deviation (in parentheses) ob-
tained with maximization of SEE with increased
h
(
h
=2
.
27
)for
d
=1
.
5
.
n
d
d
Bayes error
n
t
200 2000 20000
10
2
22.95(3.93) 22.78(4.01) 22.47(4.14)
10
4
22.73(0.41) 22.65(0.43) 22.66(0.41)
10
6
22.75(0.17) 22.67(0.14) 22.67(0.13)
1.5
22.66%
Table 4.1 shows the mean values and standard deviations over 1000 rep-
etitions for the test error of each experiment, using
h
=1
.
7, 0
.
1 and 0
.
8 for
d
=1, 1
.
5 and 3, respectively. When the amount of available data is huge,
SEE achieves Bayes discrimination as expected. For small datasets the pic-
ture is quite different. SEE still finds a good solution for
d
=3and
d
=1,
but performs poorly for
d
=1
.
5, which is near the
t
value
for Gaussian classes,
that is, in the limbo between a choice to minimize or maximize. The reason
lies in the highly non smooth estimate of the input distributions by the KDE
method, with
h<h
IMSE
. This can be solved by using fat estimation of the
input PDFs as we did for
d
=1and
d
=3. As shown in Fig. 4.7, when
h
is
too small one gets a non-smooth entropy function while for large
h
the over-
smoothed input PDF estimates provide a smooth entropy curve preserving
the maximum. The results of Table 4.2 were obtained by using fat estimation
(
h
=2
.
27)tothecase
d
=1
.
5. Now, SEE performs similarly as in the cases
d
=1and
d
=3.
The explanation for this behavior relies on the increased variance of the
estimated PDFs, which for a Gaussian kernel is given by
σ
=
s
2
+
h
2
(see