Database Reference
In-Depth Information
13.3 Techniques for Feature Selection
Feature selection techniques can be used in many applications — from
choosing the most important socio-economic parameters for determining
what a person can return on a bank loan to selecting the best set of
ingredients relating to a chemical process.
The filter approach operates independently of the data mining method
employed subsequently — undesirable features are filtered out of the
data before the learning of a filtering threshold begins. These fileterning
algorithms use heuristics based on general characteristics of the data to
evaluate the merit of feature subsets. A sub-category of filter methods,
referred to as rankers, includes methods that employ some criterion to
score each feature and provide a ranking. From this ordering, several feature
subsets can be chosen manually.
The wrapper approach [ Kohavi and John (2003) ] uses a learning
algorithm as a black box along with a statistical re-sampling technique
such as cross-validation to select the best feature subset according to some
predictive measure.
The embedded approach [ Guyon and Elisseeff (2003) ] is similar to the
wrapper approach in the sense that the features are specifically selected
for a certain learning algorithm. However, in the embedded approach the
features are selected in the process of learning.
While most of the feature selection methods have been applied to super-
vised methods (such as classification and regression) there are important
works that deals with unsupervised methods [ Wolf and Shashua (2005) ] .
Feature selection algorithms search through the space of feature subsets
in order to find the best subset. This subset search has four major properties
[ Langley (1994) ] :
Starting Point — Selecting a point in the feature subset space from which
to begin the search can affect the direction of the search.
Search Organization — A comprehensive search of the feature sub-space
is prohibitive for all but a small initial number of features.
Evaluation Strategy — How feature subsets are evaluated (filter, wrapper
and ensemble).
Stopping Criterion — A feature selector must decide when to stop
searching through the space of feature subsets.
The next sections provide detailed description of feature selection
techniques for each property described above.
Search WWH ::




Custom Search