Graphics Reference
In-Depth Information
Generally, the objective of FS is to identify the features in the data set which
are important, and discard others as redundant or irrelevant. Since FS reduces the
dimensionality of the data, DMalgorithms, especially the predictive ones, can operate
faster and obtain better outcomes by using FS. The main reason for this achieved
improvement is mainly raised by an easier and more compact representation of the
target concept [ 6 ].
Reasons for performing FS may include [ 48 ]:
removing irrelevant data;
increasing predictive accuracy of learned models;
reducing the cost of the data;
improving learning efficiency, such as reducing storage requirements and compu-
tational cost;
reducing the complexity of the resulting model description, improving the under-
standing of the data and the model.
7.2 Perspectives
Although FS is used for all types and paradigms of learning, the most well known
and commonly used field is classification. We will focus our efforts mainly on clas-
sification. The problem of FS can be explored in many perspectives. The four most
important are (1) searching for the best subset of features, (2) criteria for evaluat-
ing different subsets, (3) principle for selecting, adding, removing or changing new
features during the search and (4) applications.
First of all, FS is considered as a search problem for an optimal subset of features
for general or specific purposes, depending on the learning task and kind of algorithm.
Secondly, there must be a survey of evaluation criteria to determine proper applica-
tions. Third, the method used to evaluate the features is crucial to categorize methods
according to the direction of the search process. The consideration of univariate or
multivariate evaluation is also a key factor in FS. Lastly, we will specifically study
the interaction between FS and classification.
7.2.1 The Search of a Subset of Features
FS can be considered as a search problem, where each state of the search space
corresponds to a concrete subset of features selected. The selection can be represented
as a binary array, with each element corresponding to the value 1, if the feature is
currently selected by the algorithm and 0, if it does not occur. Hence, there should
be a total of 2 M subsets where M is the number of features of a data set. A simple
case of the search space for three features is depicted in Fig. 7.1 . The optimal subset
would be between the beginning and the end of this graph.
 
Search WWH ::




Custom Search