Information Technology Reference
In-Depth Information
have been increasingly used (Lotte and Guan 2009 ; Blankertz et al. 2010 ), as well
as Bayesian LDA (Hoffmann et al. 2008 ; Rivet et al. 2009 ). Both variants of LDA
are speci
cally designed to be more resistant to the curse-of-dimensionality through
the use of automatic regularization. As such, they have proven to be very effective
in practice, and superior to classical LDA. Indeed, the number of features is gen-
erally higher for ERP-based BCI than for those based on oscillatory activity.
Actually, many time points are usually needed to describe ERP but only a few
frequency bands (or only one) to describe oscillatory activity. Alternatively, feature
selection or channel selection techniques can also be used to deal with this high
dimensionality (Lotte et al. 2009a ; Rakotomamonjy and Guigue 2008 ; Krusienski
et al. 2006 ). As for BCI based on oscillatory activity, spatial
filters can also prove
very useful.
7.4.2 Spatial Filters for ERP-based BCI
As mentioned above, with ERP the number of features is usually quite large, with
many features per channel and many channels used. The tools described for
oscillatory activity-based BCI, i.e., feature selection, channel selection, or spatial
filtering can be used to deal with that. While feature and channel selection algo-
rithms are the same (these are generic algorithms), spatial
filtering algorithms for
ERP are different. One may wonder why CSP could not be used for ERP classi-
fication. This is due to the fact that a crucial information for classifying ERP is the
EEG time course. However, CSP completely ignores this time course as it only
considers the average power. Therefore, CSP is not suitable for ERP classi
cation.
Fortunately, other spatial
filters have been speci
cally designed for this task.
One useful spatial
filter available is the Fisher spatial
filter (Hoffmann et al.
2006 ). This
filter uses the Fisher criterion for optimal class separability. Informally,
this criterion aims at maximizing the between-class variance, i.e., the distance
between the different classes (we want the feature vectors from the different classes
to be as far apart from each other as possible, i.e., as different as possible) while
minimizing the within-class variance, i.e., the distance between the feature vectors
from the same class (we want the feature vectors from the same class to be as
similar as possible). Formally,
this means maximizing the following objective
function:
J Fisher ¼ tr ð S b Þ
tr ð S w Þ
ð 7 : 9 Þ
with
N c
p k ð x k x Þð x k x Þ T
S b ¼
ð 7 : 10 Þ
k ¼ 1
 
Search WWH ::




Custom Search