Digital Signal Processing Reference
In-Depth Information
μ
g
1
Gating
network
g 2
μ 1
μ
2
Expert
Expert
x
network
network
x
x
Figure 6.5
Mixture of two expert networks.
nonterminals of the tree.
Figure 6.5 shows the typical architecture of a mixture of experts.
These networks receive the vector x as input and produce scalar outputs
that are a partition of unity at each point in the input space. They are
linear with the exception of a single output nonlinearity. Expert network
i produces its output μ i as a generalized function of the input vector x
and a weight vector u i :
μ i = u i x
(6.8)
The neurons of the gating networks are nonlinear.
Let ξ i be an intermediate variable; then
ξ i = v i x
(6.9)
where v i is a weight vector. Then the i th output is the “softmax”
function of ξ i given as
exp ( ξ i )
g i =
k
exp ( ξ k ) .
(6.10)
Note that g i > 0and i
g i =1.The g i s can be interpreted as providing
a “soft” partitioning of the input space.
 
Search WWH ::




Custom Search