Information Technology Reference
In-Depth Information
(though not significantly) and therefore our recommended choice. The heuristics
were designed for linear regression classifier models, but the same concepts apply
to designing heuristics for classification models.
The mixing model in XCS was never designed to maximise the data likelihood,
and therefore the comparison to other heuristics might not seem completely
fair. However, it was shown previously [83] that it also performs worst with
respect to the mean squared error measure, and thus is not a good choice for a
mixing model. Rather, mixing by inverse variance should be used as a drop-in
replacement in XCS, but this recommendation is more strongly based on previous
experiments [83] (see Sect. 6.4) rather than the empirical results presented here.
This chapter completes the discussion of how to find the LCS model parame-
ters θ by the principle of maximum likelihood for a fixed model structure
.
The next step is to provide a framework that lets us in addition find a good
model structure, that is, a good set of classifiers. The taken approach is unable
to identify good model structures at the model structure level
M
alone, but
requires the reformulation of the probabilistic model itself to avoid overfitting
even when finding the model parameters for a fixed model structure. This re-
quires a deviation from the principle of maximum likelihood, which, however,
does not completely invalidate the work that was presented in the last two chap-
ters. Rather, the new update equations for parameter learning are up to small
modifications similar to the ones that provide maximum likelihood estimates.
Investigating these differences provides valuable insight into how exactly model
selection infiltrates the parameter learning process.
M
 
Search WWH ::




Custom Search