Information Technology Reference
In-Depth Information
4.2.3 A Multiple-Challenge Case Study for Feature Ranking
The issues identified in the previous sections have been dealt in the scientific
literature in separate ways, but in reality they constitute a complex of challenges
to be addressed in an integrated manner, especially when pursuing goals of effi-
ciency and effectiveness as it is in real applications and in production environments.
The problem on which we focus our interest is to obtain a new model for rank-
ing features which combines effective FE methods to a representation model that is
humanly understandable and can be integrated in domain knowledge. It is also an
objective to explore how generalizable is the efficacy of this new method and how
it benefits from a modular architecture that allows to choose between alternative
methods of feature extraction depending on restrictions imposed by specific appli-
cations. In order to compare the quality of the new model, and its possible variants,
to the classical methods it is necessary to identify suitable performance metrics and
a benchmarking methodology that uses reference datasets. At the same time there
has to be explored the possibility to obtain cost-benefit functions of the features for
use in decision-making.
4.3 Focus on Feature Extraction Based Ranking
4.3.1 Linear Models
Many known techniques of Feature Extraction (FE) differ in the principle underlying
the detection of an optimal new set of features. However, all of them show an under-
lying unity in the calculation of geometric transformation, algebraically expressed
as projection (or mapping) matrix.
In Linear Discriminant Analysis (LDA), where a linear separability of classes is
assumed, the principle underlying the detection of a new feature is that of maximising
the ratio of the between-class variance to the within-class variance on this feature.
Therefore a set of new features are obtained by maximizing the ratio of the between-
class covariance matrix S b to the within-class covariance matrix S w . The projection
matrix is the eigenvector matrix U obtained by solving the generalized eigenvalue
problem: S b ·
U
=
S w ·
U
· ʛ
, where
ʛ
is a diagonal matrix whose entries are the
eigenvalues of U . Each eigenvalue
ʻ i measures the relative capability of each new
feature u i of separating classes.
A limitation of the classic LDA algorithm is that both S w and S b matrices must be
non-singular in order to preserve the orthonormality of the mapping. For this reason
several variants of the classic algorithm have been proposed in order to overcome the
singularity problem. In particular in this work, we consider the Orthogonal Linear
Discriminant Analysis ( OLDA ) algorithm [ 37 ]. This algorithm uses Singular Value
Decomposition to obtain a non-singular approximation of S w 1
S b . When S w and
S b matrices are non-singular, OLDA and classic LDA give identical results.
·
 
Search WWH ::




Custom Search