The Optimal Set of Classifiers - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

The get the variational bound of the whole model structure, and with it the

lower bound on the logarithm of the model evidence ln p ( Y ), we need to compute

L M ( q )+

( q )=

k L k ( q ) ,

(7.96)

where

L M ( q ) are given by (7.91) and (7.95) respectively.

Training the model means maximising

L k ( q )and

( q ) (7.96) with respect to its parame-

W k , Λ k ,a τ k ,b τ k ,a α k ,b α k , V ∗ , Λ V ,a β k ,b β k }

ters

{

. In fact, deriving the maximum

( q ) with respect to each of these parameters separately while keeping the

others constant results in the variational update equations that were derived in

the previous sections [19].

7.3.9

Independent Classifier Training

As we can see from (7.91), we need to know the responsibilities

to train

each of the classifiers. The mixing model, on the other hand, relies on the

goodness-of-fit of the classifiers, as embedded in g k in (7.95). Therefore, clas-

sifiers and mixing model need to be trained in combination to maximise (7.96).

Taking this approach, however, introduces local optima in the training process,

as already discussed for the non-Bayesian MoE model in Sect. 4.1.5. Such local

optima make evaluating the model evidence for a single model structure too

costly to perform ecient model structure search, and so the training process

needs to be modified to remove these local optima. Following the same approach

as in Sect. 4.4, we train the classifiers independently of the mixing model.

More specifically, the classifiers are fully trained on all observations that they

match, independently of other classifiers, and then combined by the mixing mo-

del. Formally, this is achieved by replacing the responsibilities r nk by the mat-

ching functions m k ( x n ).

The only required modification to the variational update equations is to

change the classifier model updates from (7.30) - (7.33) to

Λ k = E α ( α k ) I +

{

r nk }

m k ( x n ) x n x n ,

(7.97)

= Λ k − 1

w kj

m k ( x n ) x n y nj ,

(7.98)

= a τ + 1

a τ k

m k ( x n ) ,

(7.99)

⎛

⎝

⎞

⎠ .

2 D Y

w kj T Λ k w kj

b τ k

m k ( x n ) y nj −

= b τ +

(7.100)

Thus, we are now effectively finding a w kj that minimises

2 M k +

2 ,

Xw kj −

y j

E α ( α k )

w kj

(7.101)

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home