Information Technology Reference
In-Depth Information
0.13
x 2
^
E (e)
2
0.12
0
0.11
−2
x 1
e
0.1
−2
0
2
4
−2
−1
0
1
2
Error Rate (Test) = 0.237
2.11
0.4
^
Error Rate
S
2.1
0.3
2.09
0.2
epochs
epochs
2.08
0.1
0
20
40
60
0
20
40
60
Fig. 3.10
An experiment ran for Example 3.7 with a high value of h : h =3 .
the error PDF which merge at a later stage into each other resulting in a
monomodal PDF. Figure 3.12 shows the final solution, corresponding to a
converged behavior of the algorithm with low entropy and error rates. This
solution has P ed =0, which is actually the value of the min P e for this problem
and coincident with P e Bayes . The test set error rate is also very close to zero.
The results of Examples 3.7 and 3.8 exhibit a convergence to a minimum
error rate together with a minimum error entropy, but they are just one run
of the algorithm; they don't tell us whether the MEE perceptron is learning
in a consistent way, and is converging or not towards min P e . Experimental
evidence on these issues can be gained through the learning curves, as we did
for the linear discriminant.
Figure 3.13a corresponds to generating 30 times the Gaussian dataset of
Example 3.7, for a grid of n values in [10 , 250], and computing averages and
standard deviations of training set and test set errors ( P ed ( n ) and P et ( n ))
for the 30 repetitions. The maximum number of epochs was 60 and η =
0 . 001. Perceptron training in these experiments used Rényi's quadratic EE.
Figure 3.13b corresponds to analogous experiments for the circular uniform
dataset with parameters as in Example 3.8 except for r 1 which was set to 3.
Shannon EE was used in these experiments.
The learning curves P ed , P et clearly converge with n with decreasing stan-
dard deviations as expected. The asymptotic error rates in Fig. 3.13 are clearly
very close to the theoretical min P e values, respectively 0.228 and 0.054.
Search WWH ::




Custom Search