Information Technology Reference
In-Depth Information
where r nk E
( z nk )iscommonlyreferredtoasthe responsibility of expert k for
observation n [19] and by the use of Bayes' rule and (4.8) evaluates to
r nk E
( z nk )= p ( z nk =1
|
x n , y n , θ )
p ( z nk =1
|
x n , v k ) p ( y n |
x n , θ k )
=
p ( y n |
x n , θ )
x n , θ k )
j =1 g j ( x n ) p ( y n |
g k ( x n ) p ( y n |
=
.
(4.12)
x n , θ j )
Hence, the responsibilities are distributed according to the current gating and
goodness-of-fit of an expert in relation to the gating and goodness-of-fit of the
other experts.
The Maximisation Step
In the maximisation step we aim at adjusting the model parameters to maxi-
mise the expected complete data log-likelihood. g k ( x n )and p ( y n |
x n , θ k )donot
share any parameters, and so maximising (4.11) results in the two independent
maximisation problems
N
K
max
V
r nk ln g k ( x n ) ,
(4.13)
n =1
k =1
N
K
max
θ
r nk ln p ( y n |
x n , θ k ) .
(4.14)
n =1
k =1
Note that the responsibilities are evaluated with the previous model parameters
and are not considered as being functions of these parameters. The function
concerning the gating parameters V can be maximised by the Iteratively Re-
weighted Least Squares (IRLS) algorithm as described in Chap. 6 (see also [121,
19]). The expert parameters can be modified independently, and the method
depends on the expert model. Their training is described when introducing their
models in Sect. 4.2.
To summarise, l ( θ ;
) is maximised by iterating over the expectation and the
maximisation steps. In the expectation step, the responsibilities are computed for
the current model parameters. In the maximisation step, the model parameters
are updated with the computed responsibilities. Convergence of the algorithm
can be determined by monitoring the result of (4.9).
D
4.1.4
Localisation by Interaction
The experts in the standard MoE model are localised in the input space through
the interaction of expert and gating network training: after the gating is ran-
domly initialised, the responsibilities are calculated by (4.12) according to how
well the experts fit the data in the areas of the input space that they are assi-
gned to. In the maximisation step, performing (4.13) tunes the gating parameters
 
Search WWH ::




Custom Search