Feature Selection - Data Preprocessing in Data Mining

Graphics Reference

In-Depth Information

7.6 Experimental Comparative Analyses in Feature Selection

As in the previous case, FS is considered as the data preprocessing technique in

which more effort has been invested, resulting in a huge collection of papers and

proposals that can be found in the literature. Thus, in this section we will refer to

well-known comparative studies that have involved a large set of FS methods.

The first exhaustive comparison was done in [ 26 ], where the authors chose the

1-NN classifier and compare classical FS methods; a total of 18 methods, including

6 different versions of backward and forward selection, 2 bidirectional methods, 8

alternatives of branch and boundsmethods and 2 genetic algorithmbased approaches,

a sequential and a parallel algorithm. The main conclusions point to the use of the

bidirectional approaches for small and medium scale data sets (less than 50 features),

the application of exhaustive methods such us branch and bound techniques being

permissible up to medium scale data sets and the suitability of genetic algorithms

for large-scale problems.

Regarding studies based on evaluation measures, we stress the ones devoted to

the inconsistency criterion. In [ 11 ], this simple measure is compared with others

under different search strategies as described in this chapter. The main characteristics

extracted from this measure is that it is monotonic, fast, multivariate, able to remove

redundant and/or irrelevant features, and capable of handling some noise. Using con-

sistency in exhaustive, complete, heuristic, probabilistic and hybrid searches shows

us the fact that it does not incorporate any search bias with regards to a particular

classifier, enabling it to be used with a variety of different learning algorithms. In

addition, in [ 4 ], the state of the art of consistency based FS methods is reviewed. An

empirical evaluation is then conducted comparing them with wrapper approaches,

and concludes that both perform similarly in accuracy, but the consistency-based fea-

ture selector is more efficient. Other studies, such as the impact of error estimation

on FS using the comparison of the true error of the optimal feature set with the true

error of the feature set found by a FS algorithm is estimated in [ 50 ]. In this paper, the

authors have drawn the conclusion that FS algorithms, depending on the sample size

and the classification rule, can produce feature sets whose corresponding classifiers

cause far more errors of classifier corresponding to the optimal feature set.

Considering the creation of FS methods by a combination of a feature evaluation

measure with a cutting criterion, the authors in [ 3 ] explored 6

36 com-

binations of feature selectors and compared them with an exhaustive experimental

design. The conclusions achieved were: information theory based functions obtain

better accuracy results; no cutting criterion can be generally recommended, although

those independent from the measure are the best; and results vary among learners,

recommending wrapper approaches for each kind of learner.

The use of synthetic data for studying the performance of FS methods has been

addressed in [ 7 ]. The rationale behind this methodology is to analyze the methods in

presence of a crescent number of irrelevant features, noise in the data, redundancy

and interaction between attributes, as well as the ratio between number of instances

and features. A total of nine feature selectors run over 11 artificial data sets are

×

6

=

Data Preprocessing in Data Mining

Search WWH ::

Custom Search

Home