An Algorithmic Description - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

Function

(

)

Comments

ModelProbability NK 3 D 3

D Y D V K 3 D V from TrainMixing ,

D 3

from TrainClassifier

ND 3

D 3

due to Λ − 1

D Y

TrainClassifier

NK 3 D 2

D Y D V K 3 D 2

D V from

TrainMixWeights

TrainMixing

NKD V

−

Mixing

NKD 2

D 2

due to XΛ − 1

D Y

Responsibilities

TrainMixWeights NK 3 D 2

D Y D V ( KD V ) 3 due to H − 1 ,

D 2

from Responsibilities

NK 2 D V

K 2 due to nested iteration,

D V due to Φ T ( Φ ⊗ ( g k ⊗ g j ))

Hessian

TrainMixPriors

KD V

−

ND 2

D 2

due to XΛ − 1

or | Λ − 1

D Y

VarClBound

( KD V ) 2 due to | Λ − 1

NK 2 D V

VarMixBound

Fig. 8.1. Complexity of the different functions with respect to the number of obser-

vations N , the number of classifiers K , the dimensionality of the input space D X ,the

dimensionality of the output space D Y , and the dimensionality of the mixing feature

space D V

variational inference to real-world problems, the algorithm would be required to

scale linearly with the number of classifiers K . This is best achieved by approxi-

mating the optimal mixing weights by well-tuned heuristics, as was already done

for the prior-free LCS model in Chap. 6. What remains to do is to find similar

heuristics that honour the prior. The mixing feature space dimensionality, on

the other hand, is usually D V = 1, and its influence is therefore negligible.

In summary, the presented algorithm scales with

( NK 3 D 3

D Y D V ). While

it might be possible to reduce D 3

to D 2

, it still scales super-linearly with the

number of classifiers K . This is due to the use of the generalised softmax func-

tion that requires the application of the IRLS algorithm to find its parameters.

To reduce the complexity, the softmax function needs to either be replaced by

another model that is easier to train, or well-tuned heuristics that provide a

good approximation to it.

8.2

Two Alternatives for Model Structure Search

Recall that the optimal set of classifiers

was defined as the set that maximises

p (

M|D

). Therefore, in order to find this optimal set we need to search the space

{M}

for the

such that p (

M|D

)

≥

p (

M|D

) for all

. This can theoretically

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home