A Learning Classifier Systems Model - Design and Analysis of Learning Classifier Systems - page 31

Information Technology Reference

In-Depth Information

noisy measurements, the data is almost certainly not completely correct. Hence,

we want to find a model that represents the general pattern in the training data

but does not model its noise. The field that deals with this issue is known as

model selection . Learning a model such that it perfectly fits the training set but

does not provide a good representation of f is known as overfitting .Theoppo-

site, that is, learning a model where the structural bias of the model dominates

over the information included from the training set, is called underfitting .

While in LCS several heuristics have been applied to deal with this issue, it

has never been characterised explicitly. In this and the following chapters the

aim is considered to be the minimisation of the empirical risk. In Chap. 7, we

return to the topic of model selection, and show how it can be handled with

respect to LCS it in a principled manner.

1

0.02

Observed f(x)

Real f(x)

1st order

2nd order

4th order

10th order

Empirical Risk

Expected Risk

0.9

0.8

0.015

0.7

0.6

0.5

0.01

0.4

0.3

0.005

0.2

0.1

0

0

0

0.1

0.2

0.3

0.4

0. 5

0.6

0.7

0.8

0.9

1

0

1

2

3

4

5

6

7

8

9

10

x

Degree of Polynomial

(a)

(b)

Fig. 3.1. Comparing the fit of polynomials of various degrees to 100 noisy observations

of a 2nd-order polynomial. (a) shows the data-generating function, the available ob-

servations, and the least-squares fit of polynomials of degree 1, 2, 4, and 10. (b) shows

how the expected and empirical risk changes with the degree of the polynomial. More

information is given in Example 3.1.

Example 3.1 (Expected and Empirical Risk of Fitting Polynomials of Various

Degree). Consider the data-generating function f ( x )=1 / 3

x/ 2+ x 2 ,whose

−

observations, taken over the range x

[0 , 1], are perturbed by Gaussian noise

with a standard deviation of 0 . 1. Assuming no knowledge of f ( x ), and given

only its observations, let us hypothesise that the data was indeed generated by

a polynomial of some degree d , as described by the model

∈

d

f d ( x ; θ )=

θ n x n ,

(3.3)

n =0

d +1 is the parameter vector of that model. The aim is to find the

degree d that best describes the given observations.

where θ

∈ R

Next Page

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home