Graphics Reference
In-Depth Information
7.6 Experimental Comparative Analyses in Feature Selection
As in the previous case, FS is considered as the data preprocessing technique in
which more effort has been invested, resulting in a huge collection of papers and
proposals that can be found in the literature. Thus, in this section we will refer to
well-known comparative studies that have involved a large set of FS methods.
The first exhaustive comparison was done in [ 26 ], where the authors chose the
1-NN classifier and compare classical FS methods; a total of 18 methods, including
6 different versions of backward and forward selection, 2 bidirectional methods, 8
alternatives of branch and boundsmethods and 2 genetic algorithmbased approaches,
a sequential and a parallel algorithm. The main conclusions point to the use of the
bidirectional approaches for small and medium scale data sets (less than 50 features),
the application of exhaustive methods such us branch and bound techniques being
permissible up to medium scale data sets and the suitability of genetic algorithms
for large-scale problems.
Regarding studies based on evaluation measures, we stress the ones devoted to
the inconsistency criterion. In [ 11 ], this simple measure is compared with others
under different search strategies as described in this chapter. The main characteristics
extracted from this measure is that it is monotonic, fast, multivariate, able to remove
redundant and/or irrelevant features, and capable of handling some noise. Using con-
sistency in exhaustive, complete, heuristic, probabilistic and hybrid searches shows
us the fact that it does not incorporate any search bias with regards to a particular
classifier, enabling it to be used with a variety of different learning algorithms. In
addition, in [ 4 ], the state of the art of consistency based FS methods is reviewed. An
empirical evaluation is then conducted comparing them with wrapper approaches,
and concludes that both perform similarly in accuracy, but the consistency-based fea-
ture selector is more efficient. Other studies, such as the impact of error estimation
on FS using the comparison of the true error of the optimal feature set with the true
error of the feature set found by a FS algorithm is estimated in [ 50 ]. In this paper, the
authors have drawn the conclusion that FS algorithms, depending on the sample size
and the classification rule, can produce feature sets whose corresponding classifiers
cause far more errors of classifier corresponding to the optimal feature set.
Considering the creation of FS methods by a combination of a feature evaluation
measure with a cutting criterion, the authors in [ 3 ] explored 6
36 com-
binations of feature selectors and compared them with an exhaustive experimental
design. The conclusions achieved were: information theory based functions obtain
better accuracy results; no cutting criterion can be generally recommended, although
those independent from the measure are the best; and results vary among learners,
recommending wrapper approaches for each kind of learner.
The use of synthetic data for studying the performance of FS methods has been
addressed in [ 7 ]. The rationale behind this methodology is to analyze the methods in
presence of a crescent number of irrelevant features, noise in the data, redundancy
and interaction between attributes, as well as the ratio between number of instances
and features. A total of nine feature selectors run over 11 artificial data sets are
×
6
=
 
Search WWH ::




Custom Search