All Relevant Feature Selection Methods and Applications - Feature Selection for Data and Pattern Recognition

Information Technology Reference

In-Depth Information

Chapter 2

All Relevant Feature Selection Methods

and Applications

Witold R. Rudnicki, Mariusz Wrzesie n and Wiesław Paja

Abstract All-relevant feature selection is a relatively new sub-field in the domain

of feature selection. The chapter is devoted to a short review of the field and presen-

tation of the representative algorithm. The problem of all-relevant feature selection

is first defined, then key algorithms are described. Finally the Boruta algorithm,

under development at ICM, University of Warsaw, is explained in a greater detail

and applied both to a collection of synthetic and real-world data sets. It is shown

that algorithm is both sensitive and selective. The level of falsely discovered relevant

variables is low—on average less than one falsely relevant variable is discovered for

each set. The sensitivity of the algorithm is nearly 100% for data sets for which clas-

sification is easy, but may be smaller for data sets for which classification is difficult,

nevertheless, it is possible to increase the sensitivity of the algorithm at the cost of

increased computational effort without adversely affecting the false discovery level.

It is achieved by increasing the number of trees in the random forest algorithm that

delivers the importance estimate in Boruta.

·

Keywords All-relevant feature selection

Strong and weak relevance

Feature

·

importance

Boruta

Random forest

Search WWH ::

Custom Search

Home