Information Technology Reference
In-Depth Information
1 class corresponds to the right cluster),
the evolution of the decision border, and the error PDF (computed only at the
e i values) at successive iteration steps, known as epochs, in one experiment.
Epochs =0is the initial configuration with random weights and bias. Note
the evolution of f ( e ) towards a Gaussian-resembling PDF.
The solution at Epochs =50is practically the same as it was at Epochs =
35 and doesn't change thereafter; it is a convergent solution. The entropy
and error rate graphs also become flat after Epochs =35.Weseethatthis
convergent solution is somewhat deviated from the optimal solution (the
vertical line at x 1 =1) because of insucient bias adjustment in the last
iterations.
The min P e value is exactly known for the two-class setting with equal-
covariance Gaussian inputs, separated by a linear discriminant. The min P e
is then also the Bayes optimal error given by [76]
Figure 3.3 shows the dataset (the
P e Bayes =1
Φ ( δ/ 2) ,
(3.34)
with Φ (
·
) the standardized normal CDF and δ 2 =(
μ 1 μ 1 ) T Σ 1 (
μ 1 μ 1 ),
the Mahalanobis distance. In the present case P e Bayes =0 . 1587.
We see that the final solution exhibits a training set and test set error of
around 0 . 23. However, at epochs =28a solution closer to the optimal one,
with training set error P ed =0 . 166 and test set error P et =0 . 164 had been
reached; progress from this stage was hindered by the insensitivity to w 0 .
Example 3.4. This example is similar to the preceding one; the only difference
is that we now use h =0 . 4, as predicted by the optimal IMSE formula (E.19).
The experiment illustrated in Fig. 3.4 clearly shows the convergence to two
Dirac- δ functions near 1 and
1, as in Example 3.1 when an initial small σ
was used. This should not be surprising since σ is here w T w , implying a
convergence of the weights to zero, therefore to an error PDF represented by
two Dirac- δ functions at
1+ w 0 , with the bias w 0 representing here the same
role as parameter d in Example 3.1. As a consequence we obtain a behavior
already described in Sect. 3.1.3 with a final
±
P ed =
P et =0 . 5.Notethatwith
zero weights the decision border is undefined.
Search WWH ::




Custom Search