Information Technology Reference
In-Depth Information
and maximally accurate classifiers by combining a generality measures, given by
the proportion of overall examples correctly classified, and an error measures
that is inversely proportional to the number of correct positive classifications
over all classification attempts of a rule 2 . The trade-off between generality and
error is handled by a constant γ that needs to be tuned. Thus, as in XCS, it is
dependent on a system parameter that is to be set by the user. Additionally, in
its current form, CCS aims at evolving rules that are completely accurate, and
is thus unable to cope with noisy data [153, 154]. The set of classifiers it aims
for can be described as the smallest set of classifiers that has the best trade-off
between error and generality, as controlled by the parameter γ .
7.1.2
Model Selection
Due to the shortcomings of the previously discussed LCS, these will not be
consider when defining the optimal set of classifiers. Rather, existing concepts
from current model selection methods will be used. Even though most of these
methods have different philosophical background, they all result in the principle
of minimising a combination of the model error and a measure of the model
complexity. To provide good model selection it is essential to use a good model
complexity measure, and it has been shown that, generally, methods that consi-
der the distribution of the data when judging the model complexity outperform
methods that do not [125]. Furthermore, it is also of advantage to use the full
training data rather than an independent test set [13].
Bayesian model selection meets these requirements and has additionally al-
ready been applied to the Mixtures-of-Expert model [227, 20, 216]. This makes
it an obvious choice as a model selection criterion for LCS. A short discussion of
alternative model selection criteria that might be applicable to LCS is provided
in Sect. 7.6, later in this chapter.
7.1.3
Bayesian Model Selection
Given a model structure
, Bayesian model selection is based on
finding the probability density of the model structure given the data by Bayes'
rule
M
and the data
D
p (
M|D
)
p (
D|M
) p (
M
) ,
(7.1)
where p (
) is the prior over the set of possible model structures. The “best”
model structure given the data is the one with the highest probability density
p (
M
M|D
).
The data-dependent term p (
D|M
) is a likelihood known as the evidence for
model structure
M
, and is for a parametric model with parameters θ evaluated
by
2 In [153, 154], the generality measure is called the accuracy , and the ratio of positive
correct classifications over the total number of classification attempts is the error ,
despite it being some inverse measure of the error.
Search WWH ::




Custom Search