Feature Selection - Data Preprocessing in Data Mining

Graphics Reference

In-Depth Information

Generally, the objective of FS is to identify the features in the data set which

are important, and discard others as redundant or irrelevant. Since FS reduces the

dimensionality of the data, DMalgorithms, especially the predictive ones, can operate

faster and obtain better outcomes by using FS. The main reason for this achieved

improvement is mainly raised by an easier and more compact representation of the

target concept [ 6 ].

Reasons for performing FS may include [ 48 ]:

•

removing irrelevant data;

•

increasing predictive accuracy of learned models;

•

reducing the cost of the data;

•

improving learning efficiency, such as reducing storage requirements and compu-

tational cost;

•

reducing the complexity of the resulting model description, improving the under-

standing of the data and the model.

7.2 Perspectives

Although FS is used for all types and paradigms of learning, the most well known

and commonly used field is classification. We will focus our efforts mainly on clas-

sification. The problem of FS can be explored in many perspectives. The four most

important are (1) searching for the best subset of features, (2) criteria for evaluat-

ing different subsets, (3) principle for selecting, adding, removing or changing new

features during the search and (4) applications.

First of all, FS is considered as a search problem for an optimal subset of features

for general or specific purposes, depending on the learning task and kind of algorithm.

Secondly, there must be a survey of evaluation criteria to determine proper applica-

tions. Third, the method used to evaluate the features is crucial to categorize methods

according to the direction of the search process. The consideration of univariate or

multivariate evaluation is also a key factor in FS. Lastly, we will specifically study

the interaction between FS and classification.

7.2.1 The Search of a Subset of Features

FS can be considered as a search problem, where each state of the search space

corresponds to a concrete subset of features selected. The selection can be represented

as a binary array, with each element corresponding to the value 1, if the feature is

currently selected by the algorithm and 0, if it does not occur. Hence, there should

be a total of 2 M subsets where M is the number of features of a data set. A simple

case of the search space for three features is depicted in Fig. 7.1 . The optimal subset

would be between the beginning and the end of this graph.

Search WWH ::

Custom Search

Home