A Learning Classifier Systems Model - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

incremental learning approaches that lead to similar results will also be discussed.

Still, the prototype system that is developed is only fully described from the batch

learning perspective. How to turn this system into an incremental learner is a topic

of future research.

3.2

LCS as Parametric Models

While the term model may be used in many different ways, it is here defined

as a collection of possible hypotheses about the data-generating process. Hence,

the choice of model determines the available hypotheses and therefore biases the

expressiveness about this process. Such a bias represents the assumptions that

are made about the process and its stochasticity. Understanding the assumpti-

ons that are introduced with the model allows for making statements about its

applicability and performance.

Example 3.3 (Different Linear Models and their Assumptions). A linear relation

between inputs and outputs with constant-variance Gaussian noise leads to

least squares (that is, using the L 2 loss function) linear regression. Alternatively,

assuming the noise to have a Cauchy distribution results in linear regression

using the L 1 loss function. As a Cauchy distribution has a longer tail than a

Gaussian distribution, it is more resilient to outliers. Hence it is considered as

being more robust, but the L 1 norm makes it harder to train [66]. This shows

how an assumption of a model about the data-generating process can give us

information about its expected performance.

Training a model means finding the hypothesis that is closest to what the data-

generating process is assumed to be. For example, in a linear regression model the

space of hypotheses is all hyper-planes in the input/output space, and performing

linear regression means picking the hyper-plane that best explains the available

observations.

The choice of model strongly determines how hard it is to train. While more

complex models are usually able to express a larger range of possible hypothe-

ses, this larger range also makes it harder for them to avoid overfitting and

underfitting. Hence, very often, overfitting by minimising the empirical risk is

counterbalanced by reducing the number of hypotheses that a model can express,

thus making the assumptions that a model introduces more important.

Example 3.4 (Avoiding Overfitting in Artificial Neural Networks). Reducing the

number of hidden neurons in a feed-forward neural network is a popular mea-

sure of avoiding overfitting the training data. This measure effectively reduces

the number of possible hypothesis that the model is able to express and as such

introduces a stronger structural bias. Another approach to avoiding overfitting

in neural networks training is weight decay that exponentially decays the magni-

tude of the weight of the neural connections in the network. While not initially

designed as such, weight decay is equivalent to assuming a zero mean Gaussian

prior on the weights and hence biasing them towards smaller values. This prior

is again equivalent to assuming smoothness of the target function [106].

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home