Information Technology Reference
In-Depth Information
0.4
x 2
^
E (e)
2
0.3
0
0.2
0.1
−2
x 1
e
0
−2
0
2
4
−2
−1
0
1
2
Error Rate (Test) = 0.227
1.6
0.6
^
Error Rate
S
1.5
0.4
1.4
0.2
epochs
epochs
1.3
0
0
20
40
60
0
20
40
60
Fig. 3.9 The final converged solution of Example 3.7. The downside graphs of the
Shannon entropy and the error rate (solid line for the training set and dotted line
for the test set) are in terms of the no. of epochs.
A word about the operational parameter h : had we used a too small h
(say, 0.3) we wouldn't have obtained the convergence to the desired solution;
instead, we would have got the convergence to two Dirac- δ functions as in
Example 3.4. The smallest value of h guaranteeing in the present case the
desired convergence is
0 . 75. For too high values of h the error PDF is
oversmoothed, poorly reflecting the class-conditional components of the error.
As a result the error rates behave in a somewhat erratic way as shown in
Fig. 3.10 for h =3, with a tendency of poor generalization. In the present
case it was found that h should not be higher than 2.5.
Example 3.8. The class-conditional distributions of the input data for this
example are circular uniform, defined as:
μ t ,r t )= πr t x μ t )
2
r t
f X|t ( x )= cu ( x ;
.
(3.42)
0 rwi e
Let us consider 300-instance training and test datasets (150 instances per
class), distributed as in (3.42) with
μ 1 =[0 0 T ,r 1 =1and
μ 1 =
[3 0] T ,r 1 =2. The (Shannon) MEE algorithm was applied with h =0 . 7 (fat
estimation) and η =0 . 001.
Figure 3.11 shows the evolution of the decision border and the error PDF
(computed only at the e i values) in one experiment. Note the two peaks of
 
Search WWH ::




Custom Search