A Geometric Approach to Feature Ranking Based Upon Results of Effective Decision Boundary Feature Matrix - Feature Selection for Data and Pattern Recognition

Information Technology Reference

In-Depth Information

object belonging to a class, and each column represents a characterizing feature. In a

geometric sight, the objects can be thought as points positioned in an m -dimensional

space of features. The solution to a problem of classification can be thought as the

procedure that finds the hyperplanes that, in the feature space, separate the classes

of points.

A broad group of FS techniques is based on the construction and ranking of new

features [ 11 , 16 ]. The Feature Extraction (FE) process is based on a transforma-

tion of the original set of real features by a linear combination of these, by which

the power to discriminate among classes is concentrated on a reduced number of

extracted features. The relevancy of each individual feature is evaluated, in fact the

set of eigenvalues , always associated with the transformation process, represents the

relative relevancy of each extracted feature and allows ranking them. It is impor-

tant to notice, however, that FE methodology was conceived primarily to do data

compression, therefore it effectively reduces the size of the initial volume of data,

but it implies the entire dataset to be available to construct each extracted feature;

clearly the FE approach is of no help in an application where the containment of data

acquisition costs is important. Furthermore the FE model is very application specific

since extracted features are uniquely associated with a dataset.

For FS, three modes of application have been identified by [ 6 , 16 , 29 ] in relation

to the dependence on the classification algorithm: in the wrapper mode, selection

and classification are iterated to refine the selection of features up to achieving an

optimal performance of the classification algorithm. The exploration of the solution

space can be conducted either with the brute force or the heuristic approaches. The

wrapper mode is supervised and is not suitable for applications in real-time, although

some solutions have been proposed that increase its performance whilst avoiding its

procedural complexity [ 28 ]. By contrast, in the filter mode the features that respond

to a general criterion of relevancy for a classification process are selected. The filter

method is applied in a unique step independently of the classification algorithm. In

the Embedded mode, FS is part of the model training process, and features relevancy

is obtained by analyzing their utility for optimizing the objective function of the

learning model; an application example is in [ 30 ]. From a productive point of view

these three methods represent different levels of trade-off between ease of execution

and accuracy of the results.

When heuristic methods are used in feature selection the search of the optimal

subset is done by attempts, by which there is built an evaluation function that provides

for each subset of real features its ability to discriminate between classes [ 8 , 36 ]. The

results depend sensibly on the heuristic adopted and the amount of points effectively

explored of the solution space. Because of the underlying subjective assumption, the

heuristic approach is not fully reliable [ 1 , 33 ], however it has the vintage to produce

a rank model for real world features, therefore retaining a human interpretability.

Among the heuristic strategies we would like to describe briefly the following: Gain

Ratio, One-Rule and Relief-F .

The Gain Ratio algorithm [ 32 ] uses information entropy to find out how well a

feature separates instances. The goodness of each individual feature depends of how

broadly and uniformly it splits the considered data. Features are sorted from the most

Feature Selection for Data and Pattern Recognition

Search WWH ::

Custom Search

Home