Information Technology Reference
In-Depth Information
nomenclature of feature transformation in literature. In fact, feature transforma-
tion approaches are also called feature extraction or feature reduction approaches.
The key difference between feature selection and feature transformation is that
the former consists in selecting a subset of the original features while the last
generates a completely new set of features to represent a given dataset.
In this chapter, we focus on feature selection approaches. Feature selection ap-
proaches do not transform the original features; they only remove the subset of
redundant and irrelevant features, preserving the semantical meaning of original
data. Feature selection techniques can be divided into binary and continuous
techniques. Continuous feature selection techniques assign continuous weights
to each feature, while the binary approach assigns binary weights to each fea-
ture. Feature selection techniques can also follow either the filter or the wrapper
model. In the filter model, the feature selection process is performed before the
learning phase and works as a pre-processing step of the learning algorithm. In
the wrapper model, the feature selection algorithm uses the learning algorithm
as a subroutine. The main disadvantage of the wrapper model is the huge com-
putational effort employed by the learning algorithm to evaluate each feature
subset [18].
A vast amount of feature selection algorithms has been presented in litera-
ture. One of the former algorithms was presented in [19], where the property of
monotonicity is employed to prune the search space, and divergence is employed
to evaluate features.
One of the most well-known feature selection algorithms is Relief [14]. The
general principle of Relief is to measure the quality of features according to how
their values distinguish instances of different classes. Given a randomly selected
instance S from a dataset R ,with k features (attributes), Relief searches the
dataset for the nearest neighbor of the same class, which is called nearest hit H ,
and Relief also searches the dataset for the nearest neighbor of the different class,
called nearest miss M . It updates the quality estimator W [ f i ]ofallfeatures f i ,
depending on the difference between the feature values of the instances S , H e
M . This process repeats n times, where n is a parameter specified by the user.
The time complexity of Relief is O ( nkN ), where N is the number of instances of
the dataset and k is the number of features. Relief returns a rank of attributes
ordered according to their relevance, but it does not indicate the number of
features that should be selected. One limitation of the Relief algorithm is that
it works only for datasets with binary classes. This limitation is overcome by
Relief-F [15] that also tackles datasets with multi-valued classes.
Another well-known feature selection technique is the Decision Tree Method
(DTM) [8]. DTM adopts a forward search to generate feature subsets, using the
entropy criterion to evaluate them. DTM runs C4.5 [21], an algorithm that builds
a decision tree. Since a decision tree is a sequence of attributes that defines the
state of an instance, DTM selects the features that appear in the pruned decision
tree as the best subset, i.e., the features appearing in the path to any leaf node
in the pruned tree are selected as the best subset.
 
Search WWH ::




Custom Search