A Geometric Approach to Feature Ranking Based Upon Results of Effective Decision Boundary Feature Matrix - Feature Selection for Data and Pattern Recognition

Information Technology Reference

In-Depth Information

4.2.3 A Multiple-Challenge Case Study for Feature Ranking

The issues identified in the previous sections have been dealt in the scientific

literature in separate ways, but in reality they constitute a complex of challenges

to be addressed in an integrated manner, especially when pursuing goals of effi-

ciency and effectiveness as it is in real applications and in production environments.

The problem on which we focus our interest is to obtain a new model for rank-

ing features which combines effective FE methods to a representation model that is

humanly understandable and can be integrated in domain knowledge. It is also an

objective to explore how generalizable is the efficacy of this new method and how

it benefits from a modular architecture that allows to choose between alternative

methods of feature extraction depending on restrictions imposed by specific appli-

cations. In order to compare the quality of the new model, and its possible variants,

to the classical methods it is necessary to identify suitable performance metrics and

a benchmarking methodology that uses reference datasets. At the same time there

has to be explored the possibility to obtain cost-benefit functions of the features for

use in decision-making.

4.3 Focus on Feature Extraction Based Ranking

4.3.1 Linear Models

Many known techniques of Feature Extraction (FE) differ in the principle underlying

the detection of an optimal new set of features. However, all of them show an under-

lying unity in the calculation of geometric transformation, algebraically expressed

as projection (or mapping) matrix.

In Linear Discriminant Analysis (LDA), where a linear separability of classes is

assumed, the principle underlying the detection of a new feature is that of maximising

the ratio of the between-class variance to the within-class variance on this feature.

Therefore a set of new features are obtained by maximizing the ratio of the between-

class covariance matrix S b to the within-class covariance matrix S w . The projection

matrix is the eigenvector matrix U obtained by solving the generalized eigenvalue

problem: S b ·

U

=

S w ·

U

· ʛ

, where

ʛ

is a diagonal matrix whose entries are the

eigenvalues of U . Each eigenvalue

ʻ i measures the relative capability of each new

feature u i of separating classes.

A limitation of the classic LDA algorithm is that both S w and S b matrices must be

non-singular in order to preserve the orthonormality of the mapping. For this reason

several variants of the classic algorithm have been proposed in order to overcome the

singularity problem. In particular in this work, we consider the Orthogonal Linear

Discriminant Analysis ( OLDA ) algorithm [ 37 ]. This algorithm uses Singular Value

Decomposition to obtain a non-singular approximation of S w − 1

S b . When S w and

S b matrices are non-singular, OLDA and classic LDA give identical results.

·

Feature Selection for Data and Pattern Recognition

Search WWH ::

Custom Search

Home