Training the Classifiers - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

where

N

c k =

m k ( x n )=Tr( M k ) ,

(5.12)

n =1

is the match count of classifier k , and is in this chapter simply denoted c .Tr( M )

denotes the trace of the matrix M , which is the sum of its diagonal elements.

Hence, the inverse noise precision, that is, the noise variance, is given by the ave-

rage squared error of the model output estimates over all matched observations.

Note, however, that the precision estimate is biased, as it is based on another

estimate w [97, Chap. 5]. This can be accounted for by instead using

τ − 1 =( c

D X ) − 1

2 M ,

−

X w

−

y

(5.13)

which is the unbiased estimate of the noise precision.

To summarise, the maximum likelihood model parameters of a classifier using

batch learning are found by first evaluating (5.8) to get w and then (5.13) to

get τ .

Example 5.2 (Batch Learning with Averaging Classifiers). Averaging classifiers

are characterised by using x n =1forall n for their linear model. Hence, we have

X =(1 ,..., 1) T , and evaluating (5.8) results in the scalar weight estimate

N

w = c − 1

m ( x n ) y n ,

(5.14)

n =1

which is the outputs y n averaged over all matched inputs. Note that, as discussed

in Sect. 3.2.3, the inputs to the matching function as appearing in m ( x n )are

not necessarily the same as the ones used to build the local model. In the case

of averaging classifiers this differentiation is essential, as the inputs x n =1used

for building the local models do not carry any information that can be used for

localisation of the classifiers.

The noise precision is determined by evaluating (5.13) and results in

N

τ − 1 =( c

1) − 1

y n ) 2 ,

−

m ( x n )( w

−

(5.15)

n =1

which is the unbiased average over the squared deviation of the outputs from

their average, and hence gives an indication of which prediction error can be

expected from the linear model.

5.3

Incremental Learning Approaches to Regression

Having derived the batch learning solution, let us now consider the case where

we want to update our model with each additional observation. In particular,

assume that the model parameters w N and τ N are based on N observations, and

the new observations ( x N +1 ,y N +1 ) are to be incorporated, to get the updated

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home