Information Technology Reference
In-Depth Information
number, e.g., 48 channels in (Sannelli et al. 2010 ). However, simply using more
channels will not solve the problem. Indeed, using more channels means extracting
more features, thus increasing the dimensionality of the data and suffering more
from the curse-of-dimensionality. As such, just adding channels may even decrease
performances if too little training data is available. In order to ef
ciently exploit
multiple EEG channels, three main approaches are available, all of which contribute
to reducing the dimensionality:
￿
Feature selection algorithm: These are methods to select automatically a subset
of relevant features, among all the features extracted.
￿
Channel selection algorithms: These are similar methods that select automati-
cally a subset of relevant channels, among all channels available.
Spatial Filtering algorithms: These are methods that combine several channels
into a single one, generally using weighted linear combinations, from which
features will be extracted.
￿
They are described below.
7.3.2.1 Feature Selection
Feature selection are classical algorithms widely used in machine learning (Guyon
and Elisseeff 2003 ; Jain and Zongker 1997 ) and as such also very popular in BCI
design (Garrett et al. 2003 ). There are too main families of feature selection
algorithms:
Univariate algorithms: They evaluate the discriminative (or descriptive) power
of each feature individually. Then, they select the N best individual features
( N needs to be de
￿
ned by the BCI designer). The usefulness of each feature is
typically assessed using measures such as Student t-statistics, which measures
the feature value difference between two classes, correlation-based measures
such as R 2 , mutual information, which measures the dependence between the
feature value and the class label, etc. (Guyon and Elisseeff 2003 ). Univariate
methods are usually very fast and computationally ef
cient but they are also
suboptimal. Indeed, since they only consider the individual feature usefulness,
they ignore possible redundancies or complementarities between features. As
such, the best subset of N features is usually not the N best individual features.
As an example, the N best individual features might be highly redundant and
measure almost the same information. As such using them together would add
very little discriminant power. On the other hand, adding a feature that is
individually not very good but which measures a different information from that
of the best individual ones is likely to improve the discriminative power much
more.
Multivariate algorithms: They evaluate subsets of features together and keep the
best subset with N features. These algorithms typically use measures of global
performance for the subsets of features, such as measures of classi
￿
cation
 
Search WWH ::




Custom Search