Training the Classifiers - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

Figures 5.1 and 5.2 show one run of training the classifiers on f 1 and f 2 respec-

tively. Figure 5.1 illustrates how the weight and noise variance estimate differs

for different classifiers when trained on the same 50 observations. Figure 5.2, on

the other hand, does not display the estimates itself, but rather shows the error

of the weight vector and noise variance estimates. Let us firstly focus on the

ability of the different classifiers to estimate the weight vector.

5.4.2

Weight Vector Estimate

In the following, the RLSLMS classifier will be ignored due to its equivalence

to the RLS classifier when estimating the weight vector. Figure 5.1 shows that

while both the NLMS and the RLS algorithm estimate the weight to be about

w = 5, the RLS algorithm is more stable in its estimate. In fact, comparing

the model MSEs by the randomised ANOVA procedure reveals that this error

is significantly lower for the RLS method (randomised ANOVA: F alg (2 , 2850) =

38 . 0, F alg ,. 01 =25 . 26, p<. 01). Figure 5.1 also clearly illustrates that utilising

the MAM causes the weight estimates to be initially equivalent to the RLS

estimates, until 1 /γ = 5 observations are reached. As the input to the averaging

classifier is always x n = 1, the speed of convergence of the LMS classifier is

independent of these inputs.

The second experiment, on the other hand, demonstrates how ill-conditioned

inputs cause the convergence speed of the NLMS algorithm to deteriorate. The

upper graph of Figure 5.2 shows that while the weight estimate is close to op-

timal after 10 observations for the RLS classifier, the NLMS classifier requires

more than 50 observations to reach a similar performance, when modelling f 2

over i n

[ π/ 2 ,π )

causes the NLMS performance to drop such that it still features an MSE of

around 0.1 after 300 observations, while the performance of the RLS classifier

remains unchanged, as shown by the lower graph of Figure 5.2. This drop can

be explained by the increasing eigenvalues of c − N X N M N X N that reduce the

speed of convergence (see Sect. 5.25). The minimal MSE of a linear model is in

both cases approximately 0.00394, and the difference in performance between the

NLMS and the RLS classifier is in both cases significant (randomised ANOVA

for i n ∈

∈

[0 ,π/ 2). Even worse, changing the sampling range to i n

∈

[0 ,π/ 2]: F alg (2 , 2850) = 973 . 0, F alg ,. 001 =93 . 18, p<. 001; randomised

ANOVA for i n ∈

[ π/ 2 ,π ]: F alg (2 , 17100) = 88371 . 5, F alg ,. 001 = 2190 . 0, p<. 001).

5.4.3

Noise Variance Estimate

As the noise variance estimate depends by (5.63) on a good estimate of the

weight vector, classifiers that perform poorly on estimating the weight vector

can be expected to not perform any better when estimating the noise variance.

This suggestion is confirmed when considering the noise variance estimate of the

NLMS classifier in Fig. 5.1 that fluctuates heavily around the correct value of

1. While the RLSLMS classifier has the equivalent weight estimate to the RLS

classifier, its noise variance estimate fluctuates almost as heavily as that of the

NLMS classifier, as it also uses LMS to perform this estimate. Thus, while a good

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home