Information Technology Reference
In-Depth Information
Chapter 2
All Relevant Feature Selection Methods
and Applications
Witold R. Rudnicki, Mariusz Wrzesie n and Wiesław Paja
Abstract All-relevant feature selection is a relatively new sub-field in the domain
of feature selection. The chapter is devoted to a short review of the field and presen-
tation of the representative algorithm. The problem of all-relevant feature selection
is first defined, then key algorithms are described. Finally the Boruta algorithm,
under development at ICM, University of Warsaw, is explained in a greater detail
and applied both to a collection of synthetic and real-world data sets. It is shown
that algorithm is both sensitive and selective. The level of falsely discovered relevant
variables is low—on average less than one falsely relevant variable is discovered for
each set. The sensitivity of the algorithm is nearly 100% for data sets for which clas-
sification is easy, but may be smaller for data sets for which classification is difficult,
nevertheless, it is possible to increase the sensitivity of the algorithm at the cost of
increased computational effort without adversely affecting the false discovery level.
It is achieved by increasing the number of trees in the random forest algorithm that
delivers the importance estimate in Boruta.
·
·
Keywords All-relevant feature selection
Strong and weak relevance
Feature
·
·
importance
Boruta
Random forest
Search WWH ::




Custom Search