The Optimal Set of Classifiers - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

usually not directly accessible. The graph from Fig. 3.1(b) that shows how both

risk measures change with d is reproduced in Fig. 7.1(a) for convenience.

Using a Bayesian model of the data-generating process, one can assess the

probability of the data supporting the polynomial having a particular degree by

Bayesian model selection. The model acquired for this task is the same that is

later introduced for linear regression classifiers and thus will not be discussed

in detail. Variational Bayesian inference, as described Sect. 7.3.1, is used to

evaluate a lower “variational” bound L ( q ) on the model log-probability, that is

L ( q ) ≤ ln p ( D|M )+const. = ln p ( M|D ) + const. under the assumption of a

uniform model prior p (

( q )ishighestfor d =2,

which demonstrates that Bayesian model selection correctly identifies the data-

generating model.

). As shown in Fig. 7.1(b),

7.1.4

Applying Bayesian Model Selection to Finding the Best Set

of Classifiers

Applied to LCS, the model structure is, as previously described, defined by

the number of classifiers K and their matching functions M =

{

m k :

X→

[0 , 1]

. In order to find the best set of classifiers, we need

to maximise its probability density with respect to the data (7.1), which is

equivalent to maximising its logarithm

}

, giving

{

K, M

}

ln p (

M|D

)=ln p (

D|M

)+ln p (

)+const. ,

(7.3)

where the constant term captures the normalising constant and can be ignored

when comparing the different model structures, as it is shared between them.

Evaluating the log-evidence ln p (

D|M

) in (7.3) requires us to firstly specify a

parameter prior p ( θ

Unfortunately, the LCS model described in Chap. 4 is not fully Bayesian and

needs to be reformulated before the evidence can be evaluated. Additionally, the

resulting probabilistic model structure does not provide a closed-form solution

to (7.2). Thus, the rest of this chapter is devoted to i) introducing a fully Baye-

sian LCS model, and ii) applying an approximation method called Variational

Bayesian inference that gives us a closed-form expression for the evidence. Be-

fore we do so, let us discuss the prior p (

), and then to evaluate (7.2) to get the evidence of

) on the model structure itself, and

why the requirement of specifying parameter and model structure priors is not

an inherit weakness of the method.

7.1.5

The Model Structure Prior p (

)

Specifying the prior for p (

) lets us express our belief about which model

structures are best at representing the data, prior to knowledge of the data.

Recall that

{

M ,K

}

and thus p (

) can be decomposed into p (

p ( M

K ) p ( K ). Our belief about the number of classifiers K is that this number

is certainly always finite, which requires p ( K )

. The beliefs

about the set of matching functions of M given some K is less clear. Let us only

→

0with K

→∞

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home