Regularization Techniques for BMI Models - Brain-Machine Interface Engineering

Biomedical Engineering Reference

In-Depth Information

where ˆ

∇ =∂  

ζ n

( )

E

e

( )

n

2

  ∂

w

( )

n

, which is called the weight decay gradient update. Both RR and

weight decay can be viewed as implementations of a Bayesian approach to complexity control in

supervised learning using a zero-mean Gaussian prior [ 10 ].

The choice of the amount of regularization ( δ ) plays an important role in the generalization

performance, because there is a trade-off between the condition number and the achievable MSE

for a particular δ . A larger δ can decrease the condition number at the expense of increasing the

MSE, whereas a smaller δ can decrease the MSE but also increase the condition number. Larsen

et al. [ 17 ] proposed that δ can be optimized by minimizing the generalization error with respect to

δ . Following this procedure, we utilize the K -fold cross-validation [ 18 ], which divides the data into

K randomly chosen disjoint sets, to estimate the average generalization error empirically,

1

K

ˆ ξ

∑

(4.18)

=

ε

k

K

k

=

1

where ε k is MSE of the validation for the k th set. Then, the optimal regularization parameter is

learned by using gradient descent,

ˆ ( )

+ = − ∂

δ η ξ

n

δ

(

n

1

)

( )

n

(4.19)

∂

δ

where ˆ ( n ξ is an estimate computed with δ ( n ), and η > 0 is a learning rate. See Reference [ 17 ] for the

procedure of estimation of

ˆ ( n ) /

∂ ξ using weight decay. Once the procedure is applied, the following

change in weight distribution can be obtained as in Figure 4.1 . Here, a linear BMI was trained with both

weight decay and standard NLMS. As we can see, the solid line has many more weights with the value

of zero. This means that during adaptation, the extra degrees of freedom in the model were effectively

eliminated. The elimination of the influence of many weights resulted in an increase in the testing cor-

relation coefficient, which also had a corresponding decrease in variance as shown in Table 4.1 .

δ

4.1.2 gamma Filter

The large number of parameters in decoding models is caused not only by the number of neurons

but by the number of time delays required to capture the history of the neuron firings over time.

This attribute of the neural input topology was especially evident with the analysis of the TDNN.

This problem is compounded in the context of neural decoding we showed that the time resolution

of the neural rate coding can influence performance. If smaller time bins are used in decoding, ad-

ditional numbers of parameters could be introduced. For example, although we use a 10-tap delay

line for 100-msec bin sizes, the size of the delay line is variable depending on the bin size (e.g., if we

use a 50-msec time bin, then the number of time lags doubles for an equivalent memory depth).

Brain-Machine Interface Engineering

Search WWH ::

Custom Search

Home