Information Technology Reference
In-Depth Information
As classifiers can only generate observations if they match the corresponding
input, the classifier model itself does not require any modification. Additionally,
(4.9) is still valid, as z k = 1 only if m k = 1 by (4.20). Figure 4.3 shows the
graphical model that, when compared to Fig. 4.1, illustrates the changes that
are introduces by generalising the MoE model.
4.3.2
Updated Expectation-Maximisation Training
The only modifications to the standard MoE are changes to the gating network,
expressed by g k . As (4.12), (4.13) and (4.14) are independent of the functional
form of g k , they are still valid for the generalised MoE. Therefore, the expecta-
tion step of the EM-algorithm is again performed by evaluating the responsibi-
lities by (4.12), and the gating and classifier models are updated by (4.13) and
(4.14). Convergence of the algorithm is again monitored by (4.9).
4.3.3
Implications on Localisation
Localisation of the classifiers is achieved on one hand by the matching function
of the classifiers, and on the other hand by the combined training of gating
networks and classifiers.
Let us first consider the case when the n th observation ( x n , y n )ismatchedby
one and only one classifier k ,thatis m j ( x n )=1onlyif j = k ,and m j ( x n )=0
otherwise. Hence, by (4.22), g j ( x n )=1onlyif j = k ,and g j ( x n )=0otherwise,
and consequently by (4.12), r nj = 1 only if j = k ,and r nj =0otherwise.
Therefore, full responsibility for the observation is given to the one and only
matching classifier, independent of its goodness-of-fit.
On the other hand, assume that the same observation ( x n , y n )ismatchedby
all classifiers, that is m j ( x n ) = 1 for all j
, and assume the identity
transfer function φ ( x )= x . In that case, (4.22) reduces to the standard MoE
gating network (4.5) and we perform a soft linear partitioning as described in
Sect. 4.1.4.
In summary, localisation by matching determines for which areas of the in-
put space the classifiers attempt to model the observations. In areas where they
match, they are distributed by soft linear partitions as in the standard MoE
model. Hence, we can acquire a two-layer intuition on how localisation is perfor-
med: Matching determines the rough areas where classifiers are responsible to
model the observations, and the softmax function then performs the fine-tuning
in areas of overlap between classifiers.
∈{
1 ,...,K
}
4.3.4
Relation to Standard MoE Model
The only difference between the generalised MoE model and the standard MoE
model is the definition of the gating model g k . Comparing the standard model
(4.5) with its generalisation (4.22), the standard model is recovered from the
generalisation by having m k ( x ) = 1 for all k and x , and the identity transfer
function φ ( x )= x for all x . Defining the matching functions in such a way is
 
Search WWH ::




Custom Search