Training the Classifiers - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

parameters w N +1 and τ N +1 . The following notation will be used: X N , y N , M N ,

and c N denote the input, output, matching matrix, and match count respectively,

after N observations. Similarly, X N +1 , y N +1 , M N +1 ,c N +1 stand for the same

objects after knowing the additional observation ( x N +1 ,y N +1 ).

Several methods can be used to perform the model parameter update, starting

with computationally simple gradient-based approaches, to more complex, but

also more stable methods. Since quickly obtaining a good idea of the quality of

the model of a classifier is important, and as the noise precision quality measure

after (5.6) relies on the weight estimate, the speed of convergence with respect to

estimating both w and τ needs to be considered in addition to the computational

costs of the methods.

Firstly, a well-known adaptive filter theory principle concerning the optimality

of incremental linear models will be derived. Then we consider some gradient-

based approaches, followed by approaches that recursively track the least-squares

solution. All this only concerns the weight vector update w . Similar methods will

be applied to the noise precision τ in Sect. 5.3.7.

5.3.1

The Principle of Orthogonality

The Principle of Orthogonality determines when the weight vector estimate w N

is optimal in the weighted least squares sense of (5.5):

Theorem 5.3 (Principle of Orthogonality (for example, [105])). The

weight vector estimate w N after N observations is optimal in the sense of (5.5)

if the sequence of inputs

{

x 1 ,..., x N }

is M N -orthogonal to the sequence of esti-

( w N x 1 −

y 1 ) ,..., ( w N x N −

mation errors

{

y N )

}

, that is

m ( x n ) x n w N x n −

y n =0 .

X N , X N w N −

y N M N =

(5.16)

n =1

Proof. The solution of (5.5) is found by setting the first derivative of (5.7) to

zero to get

2 X N M N y N =0 .

The result follows from rearranging the expression.

2 X N M N X N w N −

By multiplying (5.16) by w N , a similar statement can be made about the output

estimates:

Corollary 5.4 (Corollary to the Principle of Orthogonality (for exam-

ple, [105])). The weight vector estimate w N after N observations is optimal in

the sense of (5.5) if the sequence of output estimates

w N x 1 ,..., w N x N }

{

is M N -

( w N x 1 −

y 1 ) ,..., ( w N x N −

orthogonal to the sequence of estimation errors

{

y N )

}

that is

m ( x n ) w N x n w N x n −

y n =0 .

X N w N , X N w N −

y N M N =

(5.17)

n =1

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home