Information Technology Reference
In-Depth Information
K
m nk
m k
N
z nk
v k
x n
θ k
y n
classifiers
data
Fig. 4.3. Directed graphical model of the generalised Mixtures-of-Experts model. See
the caption of Fig. 4.1 for instructions on how to read this graph. When compared to
the Mixtures-of-Expert model in Fig. 4.1, the latent variables z nk depends additionally
on the matching random variables m nk , whose values are determined by the mixing
functions m k and the inputs x n
that is, the value of a classifier's matching function determines the probability
of that classifier matching a certain input.
To enforce matching, the probability for classifier k having generated obser-
vation ( x , y ), given by (4.4), is redefined to be
exp( v k φ ( x ))
if m k =1for x ,
p ( z k =1
|
x , v k ,m k )
(4.20)
0
otherwise ,
where φ is a transfer function, whose purpose will be explained later and which
can for now be assumed to be the identity function, φ ( x )= x .Thus,thediffe-
rences from the previous definition (4.4) are the additional transfer function and
the condition on m k that locks the generation probability to 0 if the classifier
does not match the input. Removing the condition on m k by marginalising it
out results in
g k ( x )
p ( z k =1
|
x , v k )
p ( z k =1
|
x , v k ,m k ) p ( m k = m
|
x )
m∈{ 0 , 1 }
=0+ p ( z k =1
|
x , v k ,m k ) p ( m k =1
|
x )
= m k ( x )exp( v k φ ( x )) .
(4.21)
Adding the normalisation term, the gating network is now defined by
m k ( x )exp( v k φ ( x ))
j =1 m j ( x )exp( v j φ ( x ))
g k ( x )
p ( z k =1
|
x , v k )=
.
(4.22)
As can be seen when comparing it to (4.5), the additional layer of localisation is
specified by the matching function, which reduces the gating to g k ( x )=0ifthe
classifier does not match x ,thatis,if m k ( x )=0.
 
Search WWH ::




Custom Search