A Geometric Approach to Feature Ranking Based Upon Results of Effective Decision Boundary Feature Matrix - Feature Selection for Data and Pattern Recognition

Information Technology Reference

In-Depth Information

In a classification task, the FS is used to predict the so called intrinsic discriminant

dimension of the dataset, which has been defined by Lee and Landgrebe [ 24 ]as

the smallest dimensional subspace wherein the same classification accuracy can be

obtained as could be obtained in the original space. Effects of FS on accuracy have

more recently been studied by Sima et al. [ 34 ]. In [ 21 , 35 ], the problem of FS is

seen as trade-off between generalization and specialization or, equivalently, a trade-

off between bias and variance of the inductive process. A classification algorithm

partitions the instance space into regions; when the number of features is relatively

small, regions are too large, that causes the partitioning of the instances to be poor in

terms of generalization and therefore accuracy decreases, this phenomenon is called

bias . When the number of features is high, the probability that individual regions are

labeled with the wrong class is increased too. This effect is called variance . Deci-

sion tree and neural network classifiers are particularly sensible to variance. There

emerges the concept of irrelevant/redundant features that might cause the classifica-

tion algorithms loosing efficiency and accuracy, whereas the subset of features that

improves the performance of learning algorithms is defined optimal subset . All the

aspects of the learning algorithm sensitivity to the dataset dimensionality, have been

generally named as the curse of dimensionality by Kira and Rendell [ 20 ].

The optimal subset can be detected on a feature evaluation function [ 8 ]. When

doing classification, an Evaluation Function (EF) expresses for each feature subset

its ability to discriminate between classes. The effectiveness of the EF in highlighting

the relative importance of feature depends on the search strategy by which the space

of all possible subsets is explored, and it has measurable properties: accuracy (how

accurate is the prediction of the EF), generality (how suitable is the EF for different

classifiers) and time complexity (time taken to calculate the EF). A selection based

on classification accuracy can be considered effective if the classifier error rate does

not significantly decrease after selection. The authors indicate the 1NN classifier as

a convenient algorithm to build the evaluation function since it appears to always

provide a reasonable classification performance in most applications.

4.2.2 Classical Feature Selection Strategies

The FS process is divided generally into two phases: FR and FS in the strict sense. It is

necessary to rank the relative importance of features before proceeding to an optimal

selection and then learning a classification model, although these two phases can

be integrated in different modes as it will be discussed in this section. The progress

in scientific research almost coincides for ranking and selection. As in the survey

of [ 2 , 15 ], the FS methods are categorized in two main categories: (i) methods that

explore the space of possible subsets, searching an optimal subset of features by

using heuristics to limit computational complexity, (ii) methods that rank features

individually based on properties that good features are presumed to have, such as

their contribution to class separability. In the classification learning process, input

dataset is arranged in a n by m matrix where each row, or record, represents an

Search WWH ::

Custom Search

Home