Information Technology Reference
In-Depth Information
improves our ability to study real phenomena, but on the other hand huge amounts
of data produce an “informative overload”, raising data acquisition and processing
costs without effective exploitation of information. What is more, most of machine
learning techniques suffer from the so called “curse of dimensionality” effect, and
human interpretation of models generated by these techniques can be difficult on
high dimensional spaces. To address these issues, the adoption of Feature Selection
(FS) in processes is observing increasing interest and expansion.
Decision making and operations in the modern production contexts require a FS
methodwhich is generally valid for all applications, therefore robust and flexible, able
to operate interactively in a dynamic information environment, dealing effectively
with challenges posed by data heterogeneity, data bandwidth and real-time require-
ments. The large availability of information represents also a challenge because of
the exponential growth of data acquisition costs and, last but not least, energy con-
sumption by computers and acquisition sensor systems. The FS process represents a
complex decisional mechanism in which the accuracy of results is equally important
as usability, fastness, robustness and scalability. In the scientific literature, the cur-
rent approaches to FS in the machine learning process show distinct solutions which
address specific issues and highlight opposite vintages, though many practical issues
have arisen around applications in productive contexts that have never been consid-
ered on the whole. This is the context that inspires the invention and validation of
our novel Feature Ranking (FR) method that supports the FS. This chapter proposes
an innovative approach to FR that detains vintages otherwise dispersed over a vari-
ety of distinct methods. Our research is articulated over two main objectives, the
first is to obtain feature ranking leading to high accuracy in machine learning goals
achievement, the second is to provide an algorithm capable to actively consider cost
functions in supporting decision making. These issues have been studied in relation
to a machine learning process among the most known: the classification.
4.2 Feature Ranking for Classification:
The Background Picture
4.2.1 Intrinsic Discriminant Dimension
of a Classification Task
In the literature FS refers to the problem of selecting a subset of relevant features
for building robust learning models [ 19 , 27 ]. The concept of optimal feature subset
has been refined during the years by the comprehension of the dataset properties that
condition the classification performance. As it happens in generic data collections,
many of the features are insignificant to reach a learning objective. A definition of
relevant feature is provided by [ 3 ]: a feature x i is strongly relevant to dataset X if
there exist examples A and B in X that differ only in their assignment to x i and
have different labels. A feature x i is weakly relevant to classification accuracy if it
is possible to remove a subset of the features so that becomes strongly relevant.
 
Search WWH ::




Custom Search