Information Technology Reference
In-Depth Information
4.4
Independent Classifier Training
The assumption of the standard MoE model is that any observation is generated
by one and only one classifier. This was generalised by adding the restriction that
any classifier can only have generated an observation if it matches the input
associated with this observation, thereby adding an additional layer of forced
localisation of the classifiers in the input space.
Here, a change rather than a generalisation is introduced to the model ass-
umptions: as before it is assumed that the data is generated by a combination
of localised processes, but the role of the classifiers is changed from cooperating
with other classifiers in order to locally model the observations that it matches
to modelling all observations that it matches, independent of the other classifiers
that match the same inputs. This distinction becomes clearer once the resulting
formal differences have been discussed in Sects. 4.4.2 and 4.4.3.
The motivation behind this change is twofold: firstly, it removes local maxima
and thus simplifies classifier training, and secondly, it simplifies the intuition
behind what a classifier models. Firstly, these motivations are discussed in more
details, followed by their implication on training the model and the assumptions
about the data-generating process.
4.4.1
The Origin of Local Maxima
Following the discussion in Sect. 4.1.5, local maxima of the likelihood function are
the result of the simultaneous training of the classifiers and the gating network. In
the standard MoE model, this simultaneous training is necessary to provide the
localisation of the classifiers in the input space. In the introduced generalisation,
on the other hand, a preliminary layer of localisation is provided by the matching
function, and the interaction between classifiers and the gating network is only
required for inputs that are matched by more than one classifier. This was already
demonstrated in Sect. 4.3.3, where it was shown that classifiers acquire full
responsibility for inputs that they match alone. Hence, in the generalised MoE,
local maxima only arise when classifiers overlap in the input space.
4.4.2
What Does a Classifier Model?
By (4.14), a classifier aims at maximising the sum of log-likelihoods of all ob-
servations, weighted by the responsibilities. By (4.12) and (4.22), these respon-
sibilities can only be non-zero if the classifier matches the corresponding inputs,
that is, r nk > 0onlyif m k ( x n ) > 0. Hence, by maximising (4.14), a classifier
only considers observationsthatitmatches.
Given that an observation ( x n , y n ) is matched by a single classifier k ,itwas
established in Sect. 4.3.3 that r nk =1and r nj =0forall j
= k . Hence, (4.14)
assigns full weight to classifier k when maximising the likelihood of this ob-
servation. Consequently, given that all observations that a classifier matches are
 
Search WWH ::




Custom Search