Biomedical Engineering Reference
In-Depth Information
Testing of the network on a 2D cursor control task demonstrated that the ESN can perform
at the same level as the standard RMLP trained with BPTT. The CC values for the ESN x and y
coordinates, respectively, were (0.64 and 0.78), whereas the RMLP produced CC values of (0.66
and 0.79). However, the case of the ESN was trained with far less complexity.
Lastly, we would like to mention the appeal of the ESN as a model for biologically plausible
computation. If one thinks of the echo states as neuronal states, we can see how a distributed, recur-
rent topology is capable of storing information about past inputs into a diffuse set of states (neu-
rons). For the most part, the interconnectivity and the value of the weights (synapses) are immaterial
for representation. They become, however, critical for readout (approximation). In this respect, there
are very close ties between ESN and liquid state machines, as already mentioned by Maas et al. [ 43 ].
These ideas may become useful in developing new distributed paradigms for plasticity and charac-
terization of neuronal response in motor cortex.
3.2.3 Competitive Mixture of local linear Models
So far, we have investigated nonlinear decoding models that globally approximate an unknown
nonlinear mapping between neural activities and behavior. However, a complex nonlinear model-
ing task can be elucidated by dividing it into simpler tasks and combining them properly [ 46 ]; this
is an application of the well-known divide-and-conquer approach extensively used in science and
engineering. We will briefly review here the statistical framework called mixture of experts [ 47 ] that
implements this approach, and will also extend it to a more general model called gated competi-
tive experts that includes other neural architectures that have been successfully used in time series
segmentation [ 48 ] and optimal control [ 49 ]. We will here summarize the methodology that was
originally presented in Reference [ 48 ].
Let us consider modeling a vector i.i.d. 4 process that may be multimodal, assuming that the
mixing process has memory using a mixture model. To simplify the derivation, a 1D time series is
assumed, but the results can easily be generalized to the multidimensional case. Recall that the mul-
tivariate PDF of a random process, with an effective memory depth of M , can be decomposed as
N
1
p
(
χ
( )
n
=
p x n
( ( )|
χ
(
n
1
)
(3.31)
n
=
where χ( n − 1) = [ x (n − 1), … x ( n M +1)] T . Let us entertain the possibility that the random
process is produced by K switching regimes. Therefore, we propose the joint conditional density
p k x ( , χ n 1
(
(
-
)
) , where the discrete variable k
=
1… K
indicates the regime. We cannot observe
4 Independent identically distributed.
 
Search WWH ::




Custom Search