The Optimal Set of Classifiers - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

and maximally accurate classifiers by combining a generality measures, given by

the proportion of overall examples correctly classified, and an error measures

that is inversely proportional to the number of correct positive classifications

over all classification attempts of a rule 2 . The trade-off between generality and

error is handled by a constant γ that needs to be tuned. Thus, as in XCS, it is

dependent on a system parameter that is to be set by the user. Additionally, in

its current form, CCS aims at evolving rules that are completely accurate, and

is thus unable to cope with noisy data [153, 154]. The set of classifiers it aims

for can be described as the smallest set of classifiers that has the best trade-off

between error and generality, as controlled by the parameter γ .

7.1.2

Model Selection

Due to the shortcomings of the previously discussed LCS, these will not be

consider when defining the optimal set of classifiers. Rather, existing concepts

from current model selection methods will be used. Even though most of these

methods have different philosophical background, they all result in the principle

of minimising a combination of the model error and a measure of the model

complexity. To provide good model selection it is essential to use a good model

complexity measure, and it has been shown that, generally, methods that consi-

der the distribution of the data when judging the model complexity outperform

methods that do not [125]. Furthermore, it is also of advantage to use the full

training data rather than an independent test set [13].

Bayesian model selection meets these requirements and has additionally al-

ready been applied to the Mixtures-of-Expert model [227, 20, 216]. This makes

it an obvious choice as a model selection criterion for LCS. A short discussion of

alternative model selection criteria that might be applicable to LCS is provided

in Sect. 7.6, later in this chapter.

7.1.3

Bayesian Model Selection

Given a model structure

, Bayesian model selection is based on

finding the probability density of the model structure given the data by Bayes'

rule

M

and the data

D

p (

M|D

)

∝

p (

D|M

) p (

M

) ,

(7.1)

where p (

) is the prior over the set of possible model structures. The “best”

model structure given the data is the one with the highest probability density

p (

M

M|D

).

The data-dependent term p (

D|M

) is a likelihood known as the evidence for

model structure

M

, and is for a parametric model with parameters θ evaluated

by

2 In [153, 154], the generality measure is called the accuracy , and the ratio of positive

correct classifications over the total number of classification attempts is the error ,

despite it being some inverse measure of the error.

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home