Information Technology Reference
In-Depth Information
m nk x n
m k
θ k
y n
N
data
Fig. 4.6. Directed graphical model for training classifier k independently. See the
caption of Fig. 4.1 for instructions on how to read this graph. Note that the values of
the matching random variables m nk are determined by the matching function m k and
the inputs x n .
An additional consequence of classifiers being trained independently of the
responsibilities is that for standard choices of the local models (see, for example
[121]), the log-likelihood (4.24) is concave for each classifier. Therefore, it has a
unique maximum and consequently we cannot get stuck in local maxima when
training individual classifiers.
4.4.4
Training the Gating Network
Training the gating network remains unchanged, and therefore is described by
(4.12) and (4.13). Given a set of trained classifiers, the responsibilities are fully
specified by evaluating (4.12). Hence, the log-likelihood of the gating network
(4.13) is a concave function (for example, [20]), and therefore has a unique ma-
ximum.
Thus, the classifier models have unique optima and can be trained indepen-
dently of the gating network by maximising a concave log-likelihood function.
Furthermore, the gating network depends on the goodness-of-fit of the classifiers,
but as they are trained independently, the log-likelihood function of the gating
network is also concave. Therefore, the complete model has a unique maximum
likelihood, and as a consequence, the second goal of removing local maxima to
ease training of the model is reached.
4.4.5
Implications on Likelihood and Assumptions about the Data
Letting a classifier model match each observation with equal weight violates the
assumption that each observation was generated by one and only one classifier
for observations that are matched by more than one classifier. Rather, the model
of each classifier can be interpreted as a hypothesis for a data-generating process
that generated all observations of the matched area of the input space.
The gating network, on the other hand, was previously responsible for model-
ling the probabilities of some classifier having produced some observation, and
 
Search WWH ::




Custom Search