Biomedical Engineering Reference
In-Depth Information
Output
~ n
(
)
+
g 2
×
Gate
g 1
(adaptable)
×
~ 1 n
(
)
~ 2 n
(
)
Expert I
Expert II
χ ( n -1)
x ( n -1)
x ( n -M)
z -1
z -1
z -1
FIgURE 3.16: Mixture of experts during activation for a network with two experts.
explicit time dependence whenever possible. Also, there is an implicit iteration index in all the fol-
lowing equations. Given a time series of length N , we choose the free parameters of the predictors
and gate that maximize the process log-likelihood. If the innovations are i.i.d., we can rewrite the
process likelihood as
N
K
~
p
(
χ
(
n
))
= = =
g
(
χ
(
n
1
))
p
(
x
(
n
)
x
(
n
))
(3.37)
k
ε
k
k
1
n
1
k
This is difficult to maximize directly. Therefore, we propose a latent binary indicator, I k , in-
dicating which expert is valid, allowing the likelihood to be written.
N
K
= =
I
(
n
)
(3.38)
L
=
[
g
(
χ
(
n
1
))
p
(
x
(
n
)
|
χ
(
n
1
),
k
)
]
k
k
n
1
k
1
The indicator variable is “hidden,” in the sense that we do not know a priori which expert is
valid at any time step. In the E step of the EM algorithm, for a given set of free parameters of the
experts and gate, the entire data set is evaluated, holding the free parameters constant, to determine
p x ( ) χ n 1
( ( ) for all k and n . We then replace the indicator variables, I k , at
every time step, by their expected value:
(
(
-
) k
,
) and g k χ n 1
-
 
Search WWH ::




Custom Search