Information Technology Reference
In-Depth Information
In a classification task, the FS is used to predict the so called intrinsic discriminant
dimension of the dataset, which has been defined by Lee and Landgrebe [ 24 ]as
the smallest dimensional subspace wherein the same classification accuracy can be
obtained as could be obtained in the original space. Effects of FS on accuracy have
more recently been studied by Sima et al. [ 34 ]. In [ 21 , 35 ], the problem of FS is
seen as trade-off between generalization and specialization or, equivalently, a trade-
off between bias and variance of the inductive process. A classification algorithm
partitions the instance space into regions; when the number of features is relatively
small, regions are too large, that causes the partitioning of the instances to be poor in
terms of generalization and therefore accuracy decreases, this phenomenon is called
bias . When the number of features is high, the probability that individual regions are
labeled with the wrong class is increased too. This effect is called variance . Deci-
sion tree and neural network classifiers are particularly sensible to variance. There
emerges the concept of irrelevant/redundant features that might cause the classifica-
tion algorithms loosing efficiency and accuracy, whereas the subset of features that
improves the performance of learning algorithms is defined optimal subset . All the
aspects of the learning algorithm sensitivity to the dataset dimensionality, have been
generally named as the curse of dimensionality by Kira and Rendell [ 20 ].
The optimal subset can be detected on a feature evaluation function [ 8 ]. When
doing classification, an Evaluation Function (EF) expresses for each feature subset
its ability to discriminate between classes. The effectiveness of the EF in highlighting
the relative importance of feature depends on the search strategy by which the space
of all possible subsets is explored, and it has measurable properties: accuracy (how
accurate is the prediction of the EF), generality (how suitable is the EF for different
classifiers) and time complexity (time taken to calculate the EF). A selection based
on classification accuracy can be considered effective if the classifier error rate does
not significantly decrease after selection. The authors indicate the 1NN classifier as
a convenient algorithm to build the evaluation function since it appears to always
provide a reasonable classification performance in most applications.
4.2.2 Classical Feature Selection Strategies
The FS process is divided generally into two phases: FR and FS in the strict sense. It is
necessary to rank the relative importance of features before proceeding to an optimal
selection and then learning a classification model, although these two phases can
be integrated in different modes as it will be discussed in this section. The progress
in scientific research almost coincides for ranking and selection. As in the survey
of [ 2 , 15 ], the FS methods are categorized in two main categories: (i) methods that
explore the space of possible subsets, searching an optimal subset of features by
using heuristics to limit computational complexity, (ii) methods that rank features
individually based on properties that good features are presumed to have, such as
their contribution to class separability. In the classification learning process, input
dataset is arranged in a n by m matrix where each row, or record, represents an
 
Search WWH ::




Custom Search