Information Technology Reference
In-Depth Information
1 class corresponds to the right cluster),
the evolution of the decision border, and the error PDF (computed only at the
e
i
values) at successive iteration steps, known as epochs, in one experiment.
Epochs =0is the initial configuration with random weights and bias. Note
the evolution of
f
(
e
) towards a Gaussian-resembling PDF.
The solution at Epochs =50is practically the same as it was at Epochs =
35 and doesn't change thereafter; it is a convergent solution. The entropy
and error rate graphs also become flat after Epochs =35.Weseethatthis
convergent solution is somewhat deviated from the optimal solution (the
vertical line at
x
1
=1) because of insucient bias adjustment in the last
iterations.
The min
P
e
value is exactly known for the two-class setting with equal-
covariance Gaussian inputs, separated by a linear discriminant. The min
P
e
is then also the Bayes optimal error given by [76]
Figure 3.3 shows the dataset (the
−
P
e
Bayes
=1
−
Φ
(
δ/
2)
,
(3.34)
with
Φ
(
·
) the standardized normal CDF and
δ
2
=(
μ
−
1
−
μ
1
)
T
Σ
−
1
(
μ
−
1
−
μ
1
),
the Mahalanobis distance. In the present case
P
e
Bayes
=0
.
1587.
We see that the final solution exhibits a training set and test set error of
around 0
.
23. However, at epochs =28a solution closer to the optimal one,
with training set error
P
ed
=0
.
166 and test set error
P
et
=0
.
164 had been
reached; progress from this stage was hindered by the insensitivity to
w
0
.
Example 3.4.
This example is similar to the preceding one; the only difference
is that we now use
h
=0
.
4, as predicted by the optimal IMSE formula (E.19).
The experiment illustrated in Fig. 3.4 clearly shows the convergence to two
Dirac-
δ
functions near 1 and
−
1, as in Example 3.1 when an initial small
σ
was used. This should not be surprising since
σ
is here
√
w
T
w
, implying a
convergence of the weights to zero, therefore to an error PDF represented by
two Dirac-
δ
functions at
1+
w
0
, with the bias
w
0
representing here the same
role as parameter
d
in Example 3.1. As a consequence we obtain a behavior
already described in Sect. 3.1.3 with a final
±
P
ed
=
P
et
=0
.
5.Notethatwith
zero weights the decision border is undefined.