A Learning Classifier Systems Model - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

Thetruefunction f ( x ) and the given observations are shown in Fig. 3.1(a),

together with fitted polynomials of degree 1, 2, 4, and 10, using the loss function

L( y, y )=( y −

y ) 2 . The 1st-degree polynomial f 1 (that is, the straight line)

clearly underfits the data. This is confirmed by its high expected and empirical

risk when compared to other models, as shown in Fig. 3.1(b). On the other

hand, the 2nd-degree polynomial f 2 , that conforms to the true data-generating

model, represents the data well and is close to f ( x ) (but not equivalent, due

to the finite number of observations). Still, having no knowledge of f ( x )one

has no reason to stop at d = 2, particularly when observing in Fig. 3.1(b) that

increasing d reduces the empirical risk further. The expected risk, however, rises,

which indicates that the models start to overfit the data by modelling its noise.

This is clearly visible for the fit of f 10 to the data in Fig. 3.1(a), which is closer

to the observations than f 2 , but further away from f .

The trend of the expected and the empirical risk in Fig. 3.1(b) is a common

one: an increase of the model complexity (which is in our case represented by

d ) generally causes a decrease in the empirical risk. The expected risk, however,

only decreases up to a certain model complexity, from which on it starts to

increase due to the model overfitting the data. Thus, the aim is to identify the

model that minimises the expected risk, which is complicated by the fact that

this risk measure is usually not directly accessible. One needs to resort to using

the empirical risk in combination with some measure of the complexity of the

model, and finding such a measure makes finding the best model a non-trivial

problem.

3.1.2

Regression

Both regression and classification tasks aim at finding a hypothesis for the data-

generating process such that some risk measure is minimised, but differ in the

nature of the input and output space. A regression task is characterised by a

multidimensional real-valued input space

D X with D X

dimensions and

a multidimensional real-valued output space

D Y with D Y dimensions.

Thus, the inputs are column vectors x =( x 1 ,...,x D X ) T and the corresponding

outputs are column vectors y =( y 1 ,...,y D Y ) T . In the case of batch learning it

is assumed that N observations ( x n , y n ) are available in the form of the input

matrix X and output matrix Y ,

⎛

⎝

⎞

⎠

⎛

⎝

⎞

⎠

x 1 −

y 1 −

−

≡

(3.4)

x N −

y N −

−

The loss function is commonly the L 2 norm, also known as the Euclidean

distance , and is defined by L 2 ( y , y )

y i ) 2 1 / 2 . Hence,

the loss increases quadratically in all dimensions with the distance from the

desired value. Alternatively, the L 1 norm, also known as the absolute distance,

y , y 2 = i ( y i −

≡

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home