Training the Classifiers - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

weight vector estimate is a basic requirement for estimating the noise variance,

the applied LMS method seems to perform even worse when estimating the

noise variance than when estimating the weight. As can be seen in Fig. 5.1,

direct tracking of the noise variance in combination with the RLS algorithm

for a stable weight estimate gives the least noise and accurate estimate. Indeed,

while there is no significant difference in the squared estimation error between

the NLMS and RLSLMS classifier (randomised ANOVA: F alg (2 , 2850) = 53 . 68,

F alg ,. 001 =29 . 26, p<. 001; Tukey's HSD: p>. 05), the RLS classifier features

a significantly better estimate than both of the other classifier types (Tukey's

HSD: for both NLMS and RLSLMS p<. 01).

Conceptually, the same pattern is observed in the second experiment, as shown

in Fig. 5.2. However, in this case, the influence of a badly estimated weight

vector becomes clearer, and is particularly visible for the NLMS classifier. Recall

that this figure shows the estimation errors rather than the estimates itself, and

hence, the upper graph shows that the NLMS classifier only provides estimates

that are comparable to the RLSLMS and RLS classifier after 30 observations.

The performance of NLMS in the case of ill-conditioned inputs is even worse; its

estimation performance never matches that of the classifiers that utilise the RLS

algorithm for their weight vector estimate. In contrast to the first experiment

there is no significant difference between the noise variance estimation error of

the RLSLMS and RLS classifiers, but in both cases they are significantly better

than the NLMS classifier (for i n ∈

[0 ,π/ 2]: randomised ANOVA: F alg (2 , 2850) =

171 . 41, F alg ,. 001 =32 . 81, p<. 001; Tukey's HSD: NMLS vs. RLSLMS and RLS

p<. 01, RLSLMS vs. RLS p>. 05; for i n

[ π/ 2 ,π ]: randomised ANOVA:

F alg (2 , 17100) = 4268 . 7, F alg ,. 001 = 577 . 89, p<. 001; Tukey's HSD: NLMS vs.

RLS and RLSLMS p<. 01, RLSLMS vs. RLS p>. 05).

In summary, both experiments in combination demonstrate that to provide a

good noise variance estimate, the method needs to estimate the weight vector

well, and that direct tracking of this estimate is better than its estimation by

the LMS algorithm.

∈

5.5

Classification Models

After having extensively covered the training of linear regression classifier mo-

dels, let us turn our focus on classification models. In this case, we assume that

input space and output space to be

D Y ,where D Y is

the number of classes of the problem. An output vector y representing class j is

0 in all its elements except for y j =1.

Taking the generative point-of-view, a classifier is assumed to have genera-

ted an observation of some class with a certain probability, independent of the

associated input, resulting in the classifier model

D X and

{

0 , 1

}

D Y

w y j

p ( y

x , w )=

with

w j .

(5.77)

j =1

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home