The Optimal Set of Classifiers - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

In its crudest form, the two-part MDL requires a binary representation of both

the model error and the model itself, where the combined representation is to be

minimised [188, 189]. Using such an approach for LCS makes its performance

highly dependent on the representation used for the matching functions and the

model parameters, and is therefore rather arbitrary. Its dependence on the chosen

representation and the lack of guidelines on how to decide upon a particular

representation are generally considered the biggest weakness of the two-part

MDL [101].

A more refined approach is to use the Bayesian MDL [101] that — despite

a different philosophical background — is mathematically identical to Bayesian

model selection as applied here. In that sense, the approach presented in this

chapter can be said to be using the Bayesian MDL model selection criterion.

The latest MDL approach is theoretically optimal in the sense that it mini-

mises the worst-case coding length of the model. Mathematically, it is expressed

as the maximum likelihood normalised by the model complexity, where the mo-

del complexity is its coding length summed over all possible model parameter

values [191]. Therefore, given continuous model parameters, as used here, the

complexity is infinite, which makes model comparison impossible. In addition,

the LCS structure makes computing the model complexity even for a finite set

of parameters extremely complicated, which makes it unlikely that, in its pure

form, the latest MDL measure will be of any use for LCS.

7.6.2

Structural Risk Minimisation

Structural Risk Minimisation (SRM) is based on minimising an upper bound on

the expected risk (3.1), given the sum of the empirical risk (3.2) and a model

complexity metric based on the functional form of the model [218]. The func-

tional form of the model complexity enters SRM in the form of the model's

Vapnik-Chervonenkis (VC) dimensions. Having the empirical risk and the VC

dimensions of the model, we can find a model that minimises the expected risk.

The diculty of the SRM approach when applied to LCS is to find the VC

dimensions of the LCS model. For linear regression classifiers, the VC dimensions

are simply the dimensionality of the input space D X . Mixing these models,

however, introduces non-linearity that makes evaluation of the VC dimensions

dicult. An additional weakness of SRM is that it deals with worst-case bounds

that do apply to any distribution of the data, which causes the bound on the

expected risk to be quite loose and reduces its usefulness for model selection

[19].

A more powerful approach that provides us with a tighter bound to the ex-

pected risk is to use data-dependent SRM. Such an approach has been applied

to the Mixtures-of-Expert model by Azran et al. [5, 4]. It still remains to be

seen if this approach can be generalised to the LCS model, such as was done

here with the Bayesian MoE model to provide the Bayesian LCS model. If this

is possible, data-dependent SRM might be a viable alternative for defining the

optimal set of classifiers.

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home