Feature Selection - Data Preprocessing in Data Mining - page 169

Graphics Reference

In-Depth Information

Table 7.2 Accuracy metric and derivatives for a two-class (positive class and negative class)

problem

Mathematical form

tp

+

fp

Accuracy

tp

+

tn

+

fp

+

fn

Error rate

1 − Accuracy

2

n

(

fp

×

fn

−

tp

×

tn

)

Chi-squared

(

tp

+

fp

)(

tp

+

fn

)(

fp

+

tn

)(

tn

+

fn

)

) − ( tp + fp ) e ( tp , fp ) + ( tn + fn ) e ( fn , tn )

tp

Information gain

e

(

tp

+

fn

,

fp

+

tn

+

fp

+

tn

+

fn

x

x + y log 2

x

x + y

y

x + y log 2

y

x + y

where e ( x , y ) =−

−

tpr

tpr

fpr

tp

×

tn

Odds ratio

=

1

−

1

−

fpr

fp

×

fn

tpr

fpr

Probability ratio

7.2.3 Filter, Wrapper and Embedded Feature Selection

It is surely the most known and employed categorization made in FS methods for

years [ 33 ]. In the following, we will detail the three famous categories of feature

selectors: filter, wrapper and embedded.

7.2.3.1 Filters

There is an extensive research effort in the development of indirect performance mea-

sures, mostly based on the four evaluation measures described before (information,

distance, dependency and consistency), for selecting features. This model is called

the filter model.

The filter approach operates independently of the DM method subsequently

employed. The name “filter” proceeds from filtering the undesirable features out

before learning. They use heuristics based on general characteristics of the data to

evaluate the goodness of feature subsets.

Some authors differentiate a sub-category from filtering called rankers. It includes

methods that apply some criteria onwhich to score each feature and provide a ranking.

Using this ordering, the following learning process or user-defined threshold can

decide the number of useful features.

The reasons that influence the use of filters are those related to noise removal,

data simplification and increasing the performance of any DM technique. They are

prepared for dealing with high dimensional data and provide general subsets of

features that can be useful for any kind of learning process; rule induction, bayesian

models or ANNs .

A filter model of FS consists of two stages (see Fig. 7.2 ): (1) FS using measures

such as information, distance, dependence or consistency, with independence of the

learning algorithm; (2) learning and testing, the algorithm learns from the training

data with the best feature subset obtained and tested over the test data. Stage 2 is the

Next Page

Data Preprocessing in Data Mining

Search WWH ::

Custom Search

Home