Information Technology Reference
In-Depth Information
Example 3.5. Again an example similar to Example 3.3 (fat estimation of
the error PDF); the only difference is that we now use smaller variances:
Σ 1 = Σ 1 =0 . 12
× I , drastically increasing the separability of the classes.
In this case min P e = P e Bayes
0.
Figure 3.5 again shows a convergent solution that is sub-optimal due to
lack of bias adjustment. Interestingly enough at epochs 23 and 24 the test
set error rate was zero. The linear discriminant moved from right to left and
completely separated the classes (zero training set error) at epoch 24, when
the Shannon entropy was still decreasing. The bias adjustment was then lost
and the discriminant still moved slightly to the left pulled by other parameter
adjustments until the entropy stabilized near epoch 32.
0.4
x 2
^
E (e)
0.5
0.35
0
0.3
−0.5
0.25
x 1
e
−1
0.2
0
1
2
3
−2
−1
0
1
Error Rate (Test) = 0.020
1.8
0.8
^
Error Rate
S
1.6
0.6
1.4
0.4
1.2
0.2
epochs
epochs
1
0
0
10
20
30
40
50
0
10
20
30
40
50
Fig. 3.5 Graphs showing an experiment as in Fig. 3.3 (epochs=50), but with well-
separated classes.
3.2.2 Consistency and Generalization
The one-experiment runs in the preceding section help to get insight into
MEE trained linear discriminants, but are of course insucient to draw gen-
eral conclusions, even when we stick to the two-class bivariate Gaussian input
scenario.
Search WWH ::




Custom Search