Biomedical Engineering Reference
In-Depth Information
K
=
g
k n
(
)
=
1
(3.48)
k
1
Such a linear mixture can represent either a competitive or cooperative system, depending on
how the experts are penalized for errors, as determined by the cost function. In fact, it was in the
context of introducing their mixture of experts model that Jacobs et al.[ 53 ] first presented a cost
function that encourages competition among gated expert networks, which we generalize to
N
=
J
(
n
)
=
g
(
n
)
f
(
d
(
n
)
y
(
χ
(
n
)))
(3.49)
k
k
k
1
where d is the desired signal and f d ( ) - k χ ( ( ( ) is a function of the error between the desired sig-
nal and the k th expert. Because the desired signal is the same for all experts, they all try to regress the
same data and are always in competition. This alone, however, is not enough to foster specialization.
The gate uses information from the performance of the experts to produce the mixing coefficients.
There are many variations of algorithms that fall within this framework. Let us discuss the impor-
tant components one at a time, starting with the design of the desired signal.
The formalism represented by ( 3.49 ) is a supervised algorithm, in that it requires a desired
signal. However, we are interested in a completely unsupervised algorithm. A supervised algorithm
becomes unsupervised when the desired signal is a fixed transformation of the input: d d ( .
Although many transformations are possible, the two most common transformations involve the de-
lay operator and the identity matrix, resulting in prediction as explained above and auto-association,
which yields a generative model for PCA [ 54 ].
Gates can be classified into two broad categories, which we designate as input or output
based. With input-based gating, the gate is an adaptable function of the input, g k g k ( , that
learns to forecast which expert will perform the best, as we have seen in the mixture of experts.
For output-based gating, the gate is a directly calculated function of the performance, and hence,
the outputs, of the experts. The gate in the annealed competition of experts of Pawelzik et al. [ 55 ]
implements memory in the form of a local boxcar average squared error of the experts. The self-
annealing competitive prediction of Fancourt and Principe also uses the local squared error, but us-
ing a recursive estimator. The mixture of experts can also keep track of past expert performance, the
simplest example of which is the mixture model where the gate is expanded with memory to create
an estimate of the average of the posterior probabilities over the data set [ 51 ]. Perhaps, the simplest
scheme is hard competition, for which the gate chooses the expert with the smallest magnitude er-
ror in a winner-take-all fashion, as will be explained below. This method simplifies the architecture
and is very appropriate for system identification in a control framework because it simplifies the
design of controllers.
 
Search WWH ::




Custom Search