Biomedical Engineering Reference
In-Depth Information
where ˆ
∇ =∂
ζ n
( )
E
e
( )
n
2
w
( )
n
, which is called the weight decay gradient update. Both RR and
weight decay can be viewed as implementations of a Bayesian approach to complexity control in
supervised learning using a zero-mean Gaussian prior [ 10 ].
The choice of the amount of regularization ( δ ) plays an important role in the generalization
performance, because there is a trade-off between the condition number and the achievable MSE
for a particular δ . A larger δ can decrease the condition number at the expense of increasing the
MSE, whereas a smaller δ can decrease the MSE but also increase the condition number. Larsen
et al. [ 17 ] proposed that δ can be optimized by minimizing the generalization error with respect to
δ . Following this procedure, we utilize the K -fold cross-validation [ 18 ], which divides the data into
K randomly chosen disjoint sets, to estimate the average generalization error empirically,
1
K
ˆ ξ
(4.18)
=
ε
k
K
k
=
1
where ε k is MSE of the validation for the k th set. Then, the optimal regularization parameter is
learned by using gradient descent,
ˆ ( )
+ = −
δ η ξ
n
δ
(
n
1
)
( )
n
(4.19)
δ
where ˆ ( n ξ is an estimate computed with δ ( n ), and η > 0 is a learning rate. See Reference [ 17 ] for the
procedure of estimation of
ˆ ( n ) /
∂ ξ using weight decay. Once the procedure is applied, the following
change in weight distribution can be obtained as in Figure 4.1 . Here, a linear BMI was trained with both
weight decay and standard NLMS. As we can see, the solid line has many more weights with the value
of zero. This means that during adaptation, the extra degrees of freedom in the model were effectively
eliminated. The elimination of the influence of many weights resulted in an increase in the testing cor-
relation coefficient, which also had a corresponding decrease in variance as shown in Table 4.1 .
δ
4.1.2 gamma Filter
The large number of parameters in decoding models is caused not only by the number of neurons
but by the number of time delays required to capture the history of the neuron firings over time.
This attribute of the neural input topology was especially evident with the analysis of the TDNN.
This problem is compounded in the context of neural decoding we showed that the time resolution
of the neural rate coding can influence performance. If smaller time bins are used in decoding, ad-
ditional numbers of parameters could be introduced. For example, although we use a 10-tap delay
line for 100-msec bin sizes, the size of the delay line is variable depending on the bin size (e.g., if we
use a 50-msec time bin, then the number of time lags doubles for an equivalent memory depth).
 
Search WWH ::




Custom Search