Information Technology Reference
In-Depth Information
model with respect to the available data is a common task in machine learning,
known as model selection . Hence, the complex problem of defining the optimal
set of classifiers can be reduced to identifying a suitable model, and to applying
it. This is what will be done for the rest of this chapter.
Firstly, let us consider the question of optimality, and, in general, which model
properties are desirable. Using Bayesian model selection to identify good sets of
classifiers, the LCS model is reformulated as a fully Bayesian model for regres-
sion. Classification is handled in a later section. Subsequently, a longer, more
technical section demonstrates how variational Bayesian inference is applied to
find closed-form approximations to posterior distributions. This also results in
a closed-form expression for the quality of a particular model structure that
allows us to compare the suitability of different LCS model structures to ex-
plain the available data. As such, this chapter provides the first general (that is,
representation-independent) definition of optimality for a set of classifiers, and
with it an answer to the question what LCS want to learn.
7.1
What Is Optimal?
Let us consider two extremes: N classifiers, such that each observation is matched
by exactly one classifier, or a single classifier that matches all inputs. In the first
case, each classifier replicates its associated observation completely accurately,
and so the whole set of classifiers is a completely accurate representation of the
data; it has an optimal goodness-of-fit. Methods that minimise the empirical
risk, such as maximum likelihood or squared error minimisation, would evaluate
such a set as being optimal. Nonetheless, it does not provide any generalisation
in noisy data, as it does not differentiate between noise and the pattern in the
data. In other words, having one classifier per observation does not provide us
with any additional information than the data itself, and thus is not a desired
solution.
Using a single classifier that matches all inputs, on the other hand, is the
simplest LCS model structure, but has a very low expressive power. That is,
it can only express very simple pattern in the data, and will very likely have
a bad goodness-of-fit. Thus, finding a good set of classifiers involves balancing
the goodness-of-fit of this set and its complexity, which determines its expressive
power. This trade-off must be somehow expressed in each method that avoids
overfitting.
7.1.1
Current LCS Approaches
XCS has the ability to find a set of classifiers that generalises over the available
data [237, 238], and so has YCS [33] and CCS [153, 154]. This means that they
do not simply minimise the overall model error but have some built-in model
selection capability, however crude it might be.
Let us first consider XCS: its ability to generalise is brought about by a
combination of the accuracy-definition of a classifier and the operation of its
Search WWH ::




Custom Search