Information Technology Reference
In-Depth Information
Computing this probability density requires a Bayesian LCS model that was
introduced by adding priors to the probabilistic model from Chap. 4. Additio-
nally, the flexibility of the regression classifier model was increased from univa-
riate to multivariate regression. The requirement of specifying prior parameters
is not a weakness of this approach, but rather a strength, as the priors make
explicit the commonly implicit assumptions made about the data-generating
process.
Variational Bayesian inference was employed to find a closed-form solution to
p ( M|D ), in combination with various approximation to handle the generalised
softmax function that is used to combine the local classifier models to a global
model. Whilst variational Bayesian inference usually provides a lower bound
L
( q )
on ln p (
D|M
) that is directly related to p (
M|D
), these approximations invalidate
the lower bound nature of
L
( q ). Even without these approximations, the use of
L
( q ) for selecting the best set of classifiers depends very much on the tightness
of the bound, and if this tightness is consistent for different model structures
M
. Variational Bayesian inference has been shown to perform well in practice
[216, 19], and the same approximations that were used here were successfully
used for the Mixtures-of-Experts model [226, 227]. Thus, the presented method
can be expected to feature good performance when applied to LCS, but more
definite statements require further empirical investigation.
What was introduced in this chapter is the first formal and general definition
of what if means for a set of classifiers to be optimal, using the best applicable of
the currently common model selection approaches. The definition is general as
i) it is independent of the representation of the matching function, ii) it can be
used for both discrete and continuous input spaces, iii) it can handle matching
by degree, and iv) it is not restricted to the LCS model that is introduced
in this topic but is applicable to all LCS model types that can be described
probabilistically, including the linear LCS model. The reader is reminded that
the definition itself is independent of the variational inference, and thus is not
affected by the issues that are introduced through approximating the posterior.
A further significant advancement that comes with the definition of optimality is
a Bayesian model for LCS that goes beyond the probabilistic model as it makes
the prior assumptions about the data-generating process explicit. Additionally,
the use of multivariate regression is also a novelty in the LCS context.
Defining the best set of classifiers as a maximisation problem also promotes
its theoretical investigation: depending on the LCS model type, one could, for
example, ask the question if the optimal set of classifiers is ever overlapping.
In other words, does the optimal set of classifiers include classifiers that are
responsible for the same input and thus have overlapping matching? Given that
the removal of overlaps increases p (
) in all cases, then this is not the case.
Such knowledge can guide model structure search itself, as it can avoid classifier
constellations that are very likely suboptimal. Thus, further research in this
area is not only of theoretical value but can guide the design of other
M|D
LCS
components.
Search WWH ::




Custom Search