Feature Selection for Data and Pattern Recognition: An Introduction - Feature Selection for Data and Pattern Recognition

Information Technology Reference

In-Depth Information

learning rule establish the degrees of importance or relevance of features. Such

examples can be multiplied.

Even for working solutions it is worthwhile to study attributes as it is not out of

realm of possibility that some of them are excessive or repetitive, even irrelevant,

or there exist other alternatives of the same merit, and once such variables are dis-

covered, different selection can improve the performance, if not with respect to the

classification accuracy, then by better understanding of analysed concepts, possibly

more explicit presentation of information [ 23 ].

With all these factors and avenues to explore it is not surprising that the problem

of feature selection, with various meanings of this expression, is actively pursued in

research, which has given us the motivation for dedicating this topic to this area.

1.2 Chapters of the Topic

The 13 chapters included in this volume are grouped into four parts. What follows

is a short description of the content for each chapter.

Part I Estimation of Feature Importance

Chapter 2 is devoted to a review of the field of all-relevant feature selection, and

presentation of the representative algorithm [ 5 , 25 ]. The problem of all-relevant

feature selection is first defined, then key algorithms are described. Finally the

Boruta algorithm is explained in a greater detail and applied both to a collection

of synthetic and real-world data sets, with comments on performance, properties

and parameters.

Chapter 3 illustrates the three approaches to feature selection and reduction [ 17 ]:

filters, wrappers, and embedded solutions [ 25 ], combined for the purpose of fea-

ture evaluation. These approaches are used when domain knowledge is unavailable

or insufficient for an informed choice, or in order to support this expert knowledge

to achieve higher efficiency, enhanced classification, or reduced sizes of classi-

fiers. The classification task under study is that of authorship attribution with

balanced data.

Chapter 4 presents a method of feature ranking that calculates the relative weight

of features in their original domain with an algorithmic procedure [ 3 ]. The method

supports information selection of real world features and is useful when the number

of features has costs implications. It has at its core a feature extraction technique

based on effective decision boundary feature matrix, which is extended to calculate

the total weight of the real features through a procedure geometrically justified [ 28 ].

Chapter 5 focuses on weighting of characteristic features by the processes of their

sequential selection. A set of all accessible attributes can be reduced backwards,

or variables examined one by one can be selected forward. The choice can be

conditioned by the performance of a classification system, in a wrapper model,

and the observations with respect to selected variables can result in assignment

Search WWH ::

Custom Search

Home