Biomedical Engineering Reference
In-Depth Information
h
(
n
)
E
[
I
(
n
)]
=
P
(
k
|
x
(
n
),
χ
(
n
1
=
k
k
P
(
k
|
χ
(
n
1
))
p
(
x
(
n
)
|
χ
(
n
1
),
k
)
g
(
χ
(
n
1
))
p
(
x
(
n
)
|
χ
(
n
1
),
k
)
k
=
K
p
(
x
(
n
)
|
χ
(
n
1
))
=
g
(
χ
(
n
1
))
p
(
x
(
n
)
|
χ
(
n
1
),
k
)
(3.39)
k
k
1
Thus, h k is the posterior probability of expert k , given both the current value of the time series and
the recent past. For the M step, L is maximized or equivalently, the negative log-likelihood,
~
N
K
2
1
(
x
(
n
)
x
(
n
))
= =
k
2
J
=
h
(
n
)
log[
g
(
χ
(
n
1
))]
+
h
(
n
)[
+
log[
σ
(3.40)
k
k
k
k
2
2
σ
n
1
k
1
k
is globally minimized over the free parameters. The process is then repeated. If, in the M step, J is
only decreased and not minimized, then the process is called the generalized EM algorithm. This
is necessary when either the experts or gate is nonlinear, and a search for the global minimum is
impractical.
The first term in the summation of ( 3.40 ) can be regarded as the cross-entropy between
the posterior probabilities and the gate. It has a minimum when only one expert is valid and thus
encourages the experts to divide up the input space. To ensure that the outputs of the gate sum to
unity, the output layer of the MLP has a “softmax” transfer function,
exp[
s
(
χ
)]
(3.41)
k
g
(
χ
)
=
k
K
=
exp[
s
(
χ
)]
j
1
k
where s k is the k th input to the softmax. For a gate implemented as an MLP, the cross entropy term
in ( 3.40 ) cannot be minimized in a single step, and the generalized EM algorithm must be em-
ployed. If the gate is trained through gradient descent (backpropagation), the error backpropagated
to the input side of the softmax, at each time step is
J
=
g k ( ) h k
-
(3.42)
s k
This is the same backpropagated error that would result for a MSE with the posterior prob-
abilities acting as the desired signal. Thus, the posterior probabilities act as targets for the gate. For
each EM iteration, several training iterations may be required for the gate because it is implemented
using a multilayer perceptron.
There is an analytical solution for the experts, at each iteration, when they are linear predic-
~
T
k
1 and
tors,
)
=
(
1
and
=
R
p
, where R k and p k are weighted autocorrelation and cross-
x
n
w
χ
n
w
k
k
k
correlation matrices, respectively,
 
Search WWH ::




Custom Search