Information Technology Reference
In-Depth Information
learning rule establish the degrees of importance or relevance of features. Such
examples can be multiplied.
Even for working solutions it is worthwhile to study attributes as it is not out of
realm of possibility that some of them are excessive or repetitive, even irrelevant,
or there exist other alternatives of the same merit, and once such variables are dis-
covered, different selection can improve the performance, if not with respect to the
classification accuracy, then by better understanding of analysed concepts, possibly
more explicit presentation of information [ 23 ].
With all these factors and avenues to explore it is not surprising that the problem
of feature selection, with various meanings of this expression, is actively pursued in
research, which has given us the motivation for dedicating this topic to this area.
1.2 Chapters of the Topic
The 13 chapters included in this volume are grouped into four parts. What follows
is a short description of the content for each chapter.
Part I Estimation of Feature Importance
Chapter 2 is devoted to a review of the field of all-relevant feature selection, and
presentation of the representative algorithm [ 5 , 25 ]. The problem of all-relevant
feature selection is first defined, then key algorithms are described. Finally the
Boruta algorithm is explained in a greater detail and applied both to a collection
of synthetic and real-world data sets, with comments on performance, properties
and parameters.
Chapter 3 illustrates the three approaches to feature selection and reduction [ 17 ]:
filters, wrappers, and embedded solutions [ 25 ], combined for the purpose of fea-
ture evaluation. These approaches are used when domain knowledge is unavailable
or insufficient for an informed choice, or in order to support this expert knowledge
to achieve higher efficiency, enhanced classification, or reduced sizes of classi-
fiers. The classification task under study is that of authorship attribution with
balanced data.
Chapter 4 presents a method of feature ranking that calculates the relative weight
of features in their original domain with an algorithmic procedure [ 3 ]. The method
supports information selection of real world features and is useful when the number
of features has costs implications. It has at its core a feature extraction technique
based on effective decision boundary feature matrix, which is extended to calculate
the total weight of the real features through a procedure geometrically justified [ 28 ].
Chapter 5 focuses on weighting of characteristic features by the processes of their
sequential selection. A set of all accessible attributes can be reduced backwards,
or variables examined one by one can be selected forward. The choice can be
conditioned by the performance of a classification system, in a wrapper model,
and the observations with respect to selected variables can result in assignment
 
Search WWH ::




Custom Search