Information Technology Reference
In-Depth Information
object belonging to a class, and each column represents a characterizing feature. In a
geometric sight, the objects can be thought as points positioned in an m -dimensional
space of features. The solution to a problem of classification can be thought as the
procedure that finds the hyperplanes that, in the feature space, separate the classes
of points.
A broad group of FS techniques is based on the construction and ranking of new
features [ 11 , 16 ]. The Feature Extraction (FE) process is based on a transforma-
tion of the original set of real features by a linear combination of these, by which
the power to discriminate among classes is concentrated on a reduced number of
extracted features. The relevancy of each individual feature is evaluated, in fact the
set of eigenvalues , always associated with the transformation process, represents the
relative relevancy of each extracted feature and allows ranking them. It is impor-
tant to notice, however, that FE methodology was conceived primarily to do data
compression, therefore it effectively reduces the size of the initial volume of data,
but it implies the entire dataset to be available to construct each extracted feature;
clearly the FE approach is of no help in an application where the containment of data
acquisition costs is important. Furthermore the FE model is very application specific
since extracted features are uniquely associated with a dataset.
For FS, three modes of application have been identified by [ 6 , 16 , 29 ] in relation
to the dependence on the classification algorithm: in the wrapper mode, selection
and classification are iterated to refine the selection of features up to achieving an
optimal performance of the classification algorithm. The exploration of the solution
space can be conducted either with the brute force or the heuristic approaches. The
wrapper mode is supervised and is not suitable for applications in real-time, although
some solutions have been proposed that increase its performance whilst avoiding its
procedural complexity [ 28 ]. By contrast, in the filter mode the features that respond
to a general criterion of relevancy for a classification process are selected. The filter
method is applied in a unique step independently of the classification algorithm. In
the Embedded mode, FS is part of the model training process, and features relevancy
is obtained by analyzing their utility for optimizing the objective function of the
learning model; an application example is in [ 30 ]. From a productive point of view
these three methods represent different levels of trade-off between ease of execution
and accuracy of the results.
When heuristic methods are used in feature selection the search of the optimal
subset is done by attempts, by which there is built an evaluation function that provides
for each subset of real features its ability to discriminate between classes [ 8 , 36 ]. The
results depend sensibly on the heuristic adopted and the amount of points effectively
explored of the solution space. Because of the underlying subjective assumption, the
heuristic approach is not fully reliable [ 1 , 33 ], however it has the vintage to produce
a rank model for real world features, therefore retaining a human interpretability.
Among the heuristic strategies we would like to describe briefly the following: Gain
Ratio, One-Rule and Relief-F .
The Gain Ratio algorithm [ 32 ] uses information entropy to find out how well a
feature separates instances. The goodness of each individual feature depends of how
broadly and uniformly it splits the considered data. Features are sorted from the most
Search WWH ::




Custom Search