Information Technology Reference
In-Depth Information
In its crudest form, the two-part MDL requires a binary representation of both
the model error and the model itself, where the combined representation is to be
minimised [188, 189]. Using such an approach for LCS makes its performance
highly dependent on the representation used for the matching functions and the
model parameters, and is therefore rather arbitrary. Its dependence on the chosen
representation and the lack of guidelines on how to decide upon a particular
representation are generally considered the biggest weakness of the two-part
MDL [101].
A more refined approach is to use the Bayesian MDL [101] that — despite
a different philosophical background — is mathematically identical to Bayesian
model selection as applied here. In that sense, the approach presented in this
chapter can be said to be using the Bayesian MDL model selection criterion.
The latest MDL approach is theoretically optimal in the sense that it mini-
mises the worst-case coding length of the model. Mathematically, it is expressed
as the maximum likelihood normalised by the model complexity, where the mo-
del complexity is its coding length summed over all possible model parameter
values [191]. Therefore, given continuous model parameters, as used here, the
complexity is infinite, which makes model comparison impossible. In addition,
the LCS structure makes computing the model complexity even for a finite set
of parameters extremely complicated, which makes it unlikely that, in its pure
form, the latest MDL measure will be of any use for LCS.
7.6.2
Structural Risk Minimisation
Structural Risk Minimisation (SRM) is based on minimising an upper bound on
the expected risk (3.1), given the sum of the empirical risk (3.2) and a model
complexity metric based on the functional form of the model [218]. The func-
tional form of the model complexity enters SRM in the form of the model's
Vapnik-Chervonenkis (VC) dimensions. Having the empirical risk and the VC
dimensions of the model, we can find a model that minimises the expected risk.
The diculty of the SRM approach when applied to LCS is to find the VC
dimensions of the LCS model. For linear regression classifiers, the VC dimensions
are simply the dimensionality of the input space D X . Mixing these models,
however, introduces non-linearity that makes evaluation of the VC dimensions
dicult. An additional weakness of SRM is that it deals with worst-case bounds
that do apply to any distribution of the data, which causes the bound on the
expected risk to be quite loose and reduces its usefulness for model selection
[19].
A more powerful approach that provides us with a tighter bound to the ex-
pected risk is to use data-dependent SRM. Such an approach has been applied
to the Mixtures-of-Expert model by Azran et al. [5, 4]. It still remains to be
seen if this approach can be generalised to the LCS model, such as was done
here with the Bayesian MoE model to provide the Bayesian LCS model. If this
is possible, data-dependent SRM might be a viable alternative for defining the
optimal set of classifiers.
Search WWH ::




Custom Search