Information Technology Reference
In-Depth Information
the classifiers were trained according to this probability. While the gating net-
work still has the same purpose when the classifiers are trained independently,
the estimated probability is not fed back to the classifiers anymore. The cost of
this lack of feedback is a worse fit of the model to the data, which results in a
lower likelihood of the data given the model structure.
Note, however, that independent classifier training only causes a change in
the likelihood in areas where more than one classifier matches the same input.
Hence, we only get a lower likelihood if classifiers have large areas of overlap, and
it is doubtful that such a solution is ever desired. Nonetheless, the potentially
worse fit needs to be offset by the model structure search to find solutions with
suciently minimal overlap between different classifiers.
As the gating network is not gating the observations to the different classifiers
anymore, but rather mixes the independently trained classifier models to best
explain the available observations, it will in the remaining chapters be referred
to as the mixing model rather than the gating network.
4.5
A Brief Comparison to Linear LCS Models
The LCS model introduced in this chapter is a non-linear model as both the
classifiers and the mixing model have tunable parameters. It is in its structure
very similar to XCS and its derivatives, as well as to other LCS that train is
classifiers independently (for example, CCS [153, 154]).
Another popular structure for LCS models is a linear one, which is charac-
terised by the output estimate f M ( x ; θ ) being a linear function of the model
parameters θ . Assuming a linear classifier model (4.15) and a output estimate
f M ( x ; θ ) formed by the mean of p ( y
|
x , θ ) by (4.8), this estimate is given by
K
f M ( x ; θ )=
g k ( x ) w k x .
E
( y
|
x , θ )=
(4.25)
k =1
In order for f M to be a linear in θ ,the g k 's have to be independent of the
parameters, unlike for the generalised MoE
where they are parametrised by
V
; θ ) to be convex with respect to the
w k 's, with a unique maximum that is easy to find.
The linear LCS model can have different instantiations by specifying the g k 's
differently. Due to their current use in LCS, two of these instantiations are of
particular interest. The first is given by g k ( x )= m k ( x ), such that (4.25) becomes
θ . This causes the log-likelihood l (
D
K
f M ( x ; θ )=
m k ( x ) w k x .
(4.26)
k =1
Therefore, for each input x , the models of all matching classifiers are effectively
agglomerated. This clearly shows that the classifiers do not form their predictions
independently. As a consequence, classifiers cannot be interpreted as localised
models, but are rather localised components of the model that is formed by the
 
Search WWH ::




Custom Search