Biomedical Engineering Reference
In-Depth Information
Output
~
n
(
)
+
g
2
×
Gate
g
1
(adaptable)
×
~
1
n
(
)
~
2
n
(
)
Expert I
Expert II
χ (
n
-1)
x
(
n
-1)
x
(
n
-M)
z
-1
z
-1
z
-1
FIgURE 3.16:
Mixture of experts during activation for a network with two experts.
explicit time dependence whenever possible. Also, there is an implicit iteration index in all the fol-
lowing equations. Given a time series of length
N
, we choose the free parameters of the predictors
and gate that maximize the process log-likelihood. If the innovations are i.i.d., we can rewrite the
process likelihood as
N
K
~
p
(
χ
(
n
))
=
∏
= =
g
(
χ
(
n
−
1
))
⋅
p
(
x
(
n
)
−
x
(
n
))
(3.37)
k
ε
k
k
1
n
1
k
This is difficult to maximize directly. Therefore, we propose a latent binary indicator,
I
k
, in-
dicating which expert is valid, allowing the likelihood to be written.
N
K
∏
= =
I
(
n
)
(3.38)
L
=
[
g
(
χ
(
n
−
1
))
⋅
p
(
x
(
n
)
|
χ
(
n
−
1
),
k
)
]
k
k
n
1
k
1
The indicator variable is “hidden,” in the sense that we do not know a priori which expert is
valid at any time step. In the
E
step of the EM algorithm, for a given set of free parameters of the
experts and gate, the entire data set is evaluated, holding the free parameters constant, to determine
p x
( )
χ
n
1
( ( )
for all
k
and
n
. We then replace the indicator variables,
I
k
, at
every time step, by their expected value:
(
(
-
)
k
,
)
and
g
k
χ
n
1
-