Graphics Reference
In-Depth Information
With the proliferation of extremely high-dimensional data, two issues occur at the
same time: FS becomes indispensable in any learning process and the efficiency and
stability of FS algorithms could be neglected. One of the earlier studies regarding
this issue can be found in [ 21 ]. The reduction of the FS task to a quadratic optimiza-
tion problem is addressed in [ 46 ]. In that paper, the authors presented the Quadratic
Programming FS (QPFS) that uses the Nyströn method for approximate matrix diag-
onalization, making it possible to deal with very large data sets. In their experiments,
it outperformed mRMR and ReliefF using two evaluation criteria: Pearson's corre-
lation coefficient and MI . In the presence of a huge number of irrelevant features
and complex data distributions, a local learning based approach could be useful [ 53 ].
Using a prior stage for eliminating class-dependent density-based features for the
feature ranking process can alleviate the effects of high-dimensional data sets [ 19 ].
Finally, and closely related to the emerging Big Data solutions for large-scale busi-
ness data, there is a recent approach for massively parallel FS described in [ 63 ].
High-performance distributed computing architectures, such as Message Passing
Interface (MPI) and MapReduce are being applied to scale any kind of algorithms
to large data problems.
When class labels of the data are available, we can use supervised FS, otherwise
the unsupervised FS is the appropriate. This family of methods usually involve the
maximization of a clustering performance or the selection of features based on feature
dependence, correlation and relevance. The basic principle is to remove those features
carrying little or no additional information beyond that subsumed by the rest of fea-
tures. For instance, the proposal presented in [ 35 ] uses feature dependency/similarity
for redundancy reduction, without requiring any search process. The process follows
a clustering partitioning based on features and it is governed by a similarity measure
called maximal information compression index. Other algorithms for unsupervised
FS are the forward orthogonal search (FOS) [ 59 ] whose goal is to maximize the
overall dependency on the data to detect significant variables. Ensemble learning
was also used in unsupervised FS [ 13 ]. In clustering, Feature Weighting has also
been applied with promising results [ 36 ].
7.5.2 Feature Extraction
In feature extraction, we are interested in finding new features that are calculated as
a function of the original features. In this context, DR is a mapping of a multidimen-
sional space into a space of fewer dimensions.
The reader should now be reminded that in Chap. 6 we denoted these techniques
as DR techniques. The rationale behind this is that the literature has adopted this
term in greater extent than feature extraction, although both designations are correct.
In fact, the FS is a sub-family of the DR techniques, which seems logical. In this
book, we have preferred to separate FS from the general DR task due to its influence
in the research community. Furthermore, the aim of this section is to establish a link
between the corresponding sections of Chap. 6 with the FS task.
 
Search WWH ::




Custom Search