Feature Selection - Data Preprocessing in Data Mining

Graphics Reference

In-Depth Information

specialized literature is quite chaotic and is composed by hundreds of proposals,

ideas and applications related to FS. In fact, FS has surely been the most well known

technique for data preprocessing and data reduction for years, being also the most

hybridized withmany DM tasks and paradigms. As it would be impossible to summa-

rize all of the literature on the topic, we will focus our efforts on the most successful

and popular approaches.

7.5.1 Leading and Recent Feature Selection Techniques

FS is, formost researchers, the basic data preprocessing technique, especially after the

year 2000. Unfortunately, the related literature is huge, quite chaotic and difficult to

understand or categorize the differences among the hundreds of algorithms published,

due to the different conventions or notations adopted. These are the major reasons

that disable the possibility of summarizing all the feature selectors proposed in this

book. Instead of describing individual approaches, we prefer to focus attention on the

main ideas that lead to updates and improvements with respect to the classical FSs

methods reviewed in the previous sections. We intend to describe the most influential

methods and ideas (which are usually published in highly cited papers) and the most

recent and promising techniques published in high quality journals on DM, ML and

Pattern Recognition fields.

Modifications of classical feature selectors cover a vast number of proposals in

the literature. Among most of the representatives, we could emphasize some relevant

approaches. For example, in [ 28 ], the authors proposed an extension of the MIFS

algorithm under uniform information distribution (MIFS-U), and the combination of

the greedy search of MIFS with Taguchi method. The same authors, in [ 27 ] presented

a speed up MIFS based on Parzen windows, allowing the computation of MI without

requiring a large amount of memory. Advances on MI are the minimal-redundancy-

maximal-relevance (mRMR) criterion for incremental FS [ 42 ]. Behind the idea that,

in traditional feature selectors, MI is estimated on the whole sampling space, the

authors in [ 32 ], proposed the evaluation by dynamic MI, which is only estimated on

unlabeled instances. The normalized mutual information FS (NMIFS) is proposed in

[ 14 ] as an enhancement over classical MIFS, MIFS-U, and mRMR methods. Here,

the average normalized MI is proposed as a measure of redundancy among features.

A unifying framework for information theoretic FS can be found in [ 8 ]. Another

method widely studied is the Relief and its derivatives. In [ 45 ], a theoretical and

empirical analysis of this family of methods is conducted, concluding that they are

robust and noise tolerant, besides the can alleviate their computational complexity by

parallelism. Wrapper methods have been extensively studied by using classifiers such

as SVMs [ 34 ], or frameworks to jointly perform FS and SVM parameter learning

[ 39 ].

Other criteria related to separability measures and recently developed for per-

forming FS include the kernel class separability [ 57 ], which has been applied to a

variety of selectionmodes and different search strategies. In [ 15 ], the authors propose

Search WWH ::

Custom Search

Home