Digital Signal Processing Reference
In-Depth Information
μ
g
1
Gating
network
g
2
μ
1
μ
2
Expert
Expert
x
network
network
x
x
Figure 6.5
Mixture of two expert networks.
nonterminals of the tree.
Figure 6.5 shows the typical architecture of a mixture of experts.
These networks receive the vector
x
as input and produce scalar outputs
that are a partition of unity at each point in the input space. They are
linear with the exception of a single output nonlinearity. Expert network
i
produces its output
μ
i
as a generalized function of the input vector
x
and a weight vector
u
i
:
μ
i
=
u
i
x
(6.8)
The neurons of the gating networks are nonlinear.
Let
ξ
i
be an intermediate variable; then
ξ
i
=
v
i
x
(6.9)
where
v
i
is a weight vector. Then the
i
th output is the “softmax”
function of
ξ
i
given as
exp (
ξ
i
)
g
i
=
k
exp (
ξ
k
)
.
(6.10)
Note that
g
i
>
0and
i
g
i
=1.The
g
i
s can be interpreted as providing
a “soft” partitioning of the input space.
Search WWH ::
Custom Search