Mixing Independently Trained Classifiers - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

related previous study [83], to linear LCS models, and to models that treat

classifiers and mixing model as separate components by design.

6.1

Using the Generalised Softmax Function

By relating the probabilistic structure of LCS to the Mixtures-of-Experts model

in Chap. 4, the probability of classifier k generating the n th observation is given

by the generalised softmax function (4.22), that is,

m k ( x n )exp( v k φ ( x n ))

j =1 m j ( x n )exp( v j φ ( x n ))

g k ( x n )=

(6.2)

D V ,and φ ( x )

where V =

{

v k }

is the set of mixing model parameters v k

∈ R

is a transfer function that maps the input space

into some D V -dimensional

D V .InLCS, this function is usually φ ( x ) = 1 for all x

real space

,with

D V = 1, but to stay general, we do not make any assumptions about the form

of φ .

Assuming knowledge of the predictive densities of all classifiers p ( y

∈X

x , θ k ), the

data likelihood (6.1) is maximised by the expectation-maximisation algorithm

by finding the values for V that maximise (4.13), given by

r nk ln g k ( x n ) .

(6.3)

n =1

k =1

In the above equation, r nk stands for the responsibility of classifier k for obser-

vation n , given by (4.12), that is

g k ( x n ) p ( y n | x n , θ k )

j =1 g j ( x n ) p ( y n |

r nk =

(6.4)

x n , θ j )

Thus, we want to fit the mixing model to the data by minimising the cross-

entropy

− n k r nk ln g k ( x n ) between the responsibilities and the generative

mixing model.

6.1.1

Batch Learning by Iterative Reweighted Least Squares

The softmax function is a generalised linear model, and specialised tools have

been developed to fit such models [165]. Even though a generalisation of this

function is used, the same tools are applicable, as shown in this section. In

particular, the Iterative Reweighted Least Squares (IRLS) will be employed to

find the mixing model parameters.

The IRLS can be derived by applying the Newton-Raphson iterative optimi-

sation scheme [19] that, for minimising an error function E ( V ), takes the form

V (new) =

V (old)

H − 1

−

∇

E ( V ) ,

(6.5)

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home