Feature Selection - Data Preprocessing in Data Mining

Graphics Reference

In-Depth Information

With the proliferation of extremely high-dimensional data, two issues occur at the

same time: FS becomes indispensable in any learning process and the efficiency and

stability of FS algorithms could be neglected. One of the earlier studies regarding

this issue can be found in [ 21 ]. The reduction of the FS task to a quadratic optimiza-

tion problem is addressed in [ 46 ]. In that paper, the authors presented the Quadratic

Programming FS (QPFS) that uses the Nyströn method for approximate matrix diag-

onalization, making it possible to deal with very large data sets. In their experiments,

it outperformed mRMR and ReliefF using two evaluation criteria: Pearson's corre-

lation coefficient and MI . In the presence of a huge number of irrelevant features

and complex data distributions, a local learning based approach could be useful [ 53 ].

Using a prior stage for eliminating class-dependent density-based features for the

feature ranking process can alleviate the effects of high-dimensional data sets [ 19 ].

Finally, and closely related to the emerging Big Data solutions for large-scale busi-

ness data, there is a recent approach for massively parallel FS described in [ 63 ].

High-performance distributed computing architectures, such as Message Passing

Interface (MPI) and MapReduce are being applied to scale any kind of algorithms

to large data problems.

When class labels of the data are available, we can use supervised FS, otherwise

the unsupervised FS is the appropriate. This family of methods usually involve the

maximization of a clustering performance or the selection of features based on feature

dependence, correlation and relevance. The basic principle is to remove those features

carrying little or no additional information beyond that subsumed by the rest of fea-

tures. For instance, the proposal presented in [ 35 ] uses feature dependency/similarity

for redundancy reduction, without requiring any search process. The process follows

a clustering partitioning based on features and it is governed by a similarity measure

called maximal information compression index. Other algorithms for unsupervised

FS are the forward orthogonal search (FOS) [ 59 ] whose goal is to maximize the

overall dependency on the data to detect significant variables. Ensemble learning

was also used in unsupervised FS [ 13 ]. In clustering, Feature Weighting has also

been applied with promising results [ 36 ].

7.5.2 Feature Extraction

In feature extraction, we are interested in finding new features that are calculated as

a function of the original features. In this context, DR is a mapping of a multidimen-

sional space into a space of fewer dimensions.

The reader should now be reminded that in Chap. 6 we denoted these techniques

as DR techniques. The rationale behind this is that the literature has adopted this

term in greater extent than feature extraction, although both designations are correct.

In fact, the FS is a sub-family of the DR techniques, which seems logical. In this

book, we have preferred to separate FS from the general DR task due to its influence

in the research community. Furthermore, the aim of this section is to establish a link

between the corresponding sections of Chap. 6 with the FS task.

Search WWH ::

Custom Search

Home