Mixing Independently Trained Classifiers - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

( N ( KD V ) 3 ), which prohibits its application

in LCS, where we require algorithms that preferably scale linearly with the num-

ber of classifiers. Nonetheless, it is of significant theoretical value, as it provides

the values for V that maximise (6.3) and can therefore act as a benchmark for

other mixing models and their associated methods.

IRLS algorithm is of complexity

6.1.2

Incremental Learning by Least Squares

Following a similar but slightly modified derivation to the one give by Jordan

and Jacobs [121], we can incrementally approximate the maximum of (6.3) by

a recursive least squares procedure that is of lower complexity than the IRLS

algorithm. Due to the convexity of E ( V ), its unique minimum is found when its

gradient is

∇ V E ( V )= 0 ,thatis,when V satisfies

( g k ( x n )

−

r nk ) φ ( x n )= 0 ,

k =1 ,...,K.

(6.11)

n =1

Substituting (6.2) for g k , we want to solve

m k ( x n )

φ ( x n )= 0

exp( v k φ ( x n ))

r nk

m k ( x n )

j =1 m j ( x n )exp( v j φ ( x n )) −

(6.12)

n =1

Thus, the difference between the left-hand term and the right-hand term inside

the brackets is to be minimised, weighted by m k ( x n ), such that

exp( v k φ ( x n ))

j =1 m j ( x n )exp( v j φ ( x n )) ≈

r nk

m k ( x n ) ,

m k ( x n )

(6.13)

holds for all n .Solvingtheabovefor v k φ ( x n ), its desired target values is

r nk

m k ( x n ) −

ln C n ,

(6.14)

where C n = j m j ( x n )exp( v j φ ( x n )) is the normalising term that is common

to all v k φ ( x n ) and can therefore be omitted, as it disappears when v k φ ( x n )is

converted to g k ( x n ). Therefore, the target for v k φ ( x k )isln

r nk

m k ( x n ) ,weightedby

m k ( x n ). This allows us to reformulate the problem of finding values for V that

maximise (6.3) as the K linear least squares problems of minimising

m k ( x n ) v k φ ( x n )

r nk

m k ( x n )

−

k =1 ,...,K.

(6.15)

n =1

r nk

Even though r nk =0if m k ( x n ) = 0, and therefore

m k ( x n ) is undefined in

such a case, this does not cause any problems, as in such a case the weight

is equally zero which makes computing the target superfluous. Also note that

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home