A Geometric Approach to Feature Ranking Based Upon Results of Effective Decision Boundary Feature Matrix - Feature Selection for Data and Pattern Recognition

Information Technology Reference

In-Depth Information

improves our ability to study real phenomena, but on the other hand huge amounts

of data produce an “informative overload”, raising data acquisition and processing

costs without effective exploitation of information. What is more, most of machine

learning techniques suffer from the so called “curse of dimensionality” effect, and

human interpretation of models generated by these techniques can be difficult on

high dimensional spaces. To address these issues, the adoption of Feature Selection

(FS) in processes is observing increasing interest and expansion.

Decision making and operations in the modern production contexts require a FS

methodwhich is generally valid for all applications, therefore robust and flexible, able

to operate interactively in a dynamic information environment, dealing effectively

with challenges posed by data heterogeneity, data bandwidth and real-time require-

ments. The large availability of information represents also a challenge because of

the exponential growth of data acquisition costs and, last but not least, energy con-

sumption by computers and acquisition sensor systems. The FS process represents a

complex decisional mechanism in which the accuracy of results is equally important

as usability, fastness, robustness and scalability. In the scientific literature, the cur-

rent approaches to FS in the machine learning process show distinct solutions which

address specific issues and highlight opposite vintages, though many practical issues

have arisen around applications in productive contexts that have never been consid-

ered on the whole. This is the context that inspires the invention and validation of

our novel Feature Ranking (FR) method that supports the FS. This chapter proposes

an innovative approach to FR that detains vintages otherwise dispersed over a vari-

ety of distinct methods. Our research is articulated over two main objectives, the

first is to obtain feature ranking leading to high accuracy in machine learning goals

achievement, the second is to provide an algorithm capable to actively consider cost

functions in supporting decision making. These issues have been studied in relation

to a machine learning process among the most known: the classification.

4.2 Feature Ranking for Classification:

The Background Picture

4.2.1 Intrinsic Discriminant Dimension

of a Classification Task

In the literature FS refers to the problem of selecting a subset of relevant features

for building robust learning models [ 19 , 27 ]. The concept of optimal feature subset

has been refined during the years by the comprehension of the dataset properties that

condition the classification performance. As it happens in generic data collections,

many of the features are insignificant to reach a learning objective. A definition of

relevant feature is provided by [ 3 ]: a feature x i is strongly relevant to dataset X if

there exist examples A and B in X that differ only in their assignment to x i and

have different labels. A feature x i is weakly relevant to classification accuracy if it

is possible to remove a subset of the features so that becomes strongly relevant.

Search WWH ::

Custom Search

Home