All Relevant Feature Selection Methods and Applications - Feature Selection for Data and Pattern Recognition

Information Technology Reference

In-Depth Information

It has been shown by Nilsson and co-workers, that exact solution of the all rele-

vant problem requires an exhaustive search, which is intractable for all but smallest

systems.

The relevance defined earlier is a qualitative notion—a feature can either be rel-

evant or irrelevant. It is also an objective property of the system under scrutiny,

independent from the classifier used for building a model. This notion is distinct

from importance of variable , that is a quantitative and classifier-dependent measure

of the contribution of a variable to a model of the system. One can use various mea-

sures of importance of variable, provided that they satisfy the simple condition—the

importance of relevant variables should be higher than importance of irrelevant ones.

A useful and intuitive measure of importance was introduced by Breiman in random

forest (RF) classification algorithm [ 2 ].

Definition 6 ( Importance of a variable ) is the loss of the classification accuracy of

the model that was built using this variable, when the information on the variable's

value is withdrawn.

A final concept that will be used often enough in the current chapter to deserve a

mention in this section is a contrast variable .

Definition 7 ( Contrast variable ) is such descriptive variable that does not carry

information on the decision variable by design.

It is added to the system in order to discern relevant and irrelevant variables. It may be

obtained by drawing from theoretically justified probability distribution e.g. normal

or uniform; it may be also obtained from real variables by random permutation of

their values between objects. Application of contrast variables for feature selection

was first proposed by Stoppiglia et al. [ 15 ] and then independently by Tuv et al. [ 17 ],

and Rudnicki et al. [ 14 ].

Onemay notice, that any all-relevant feature selection algorithm is a special type of

classification algorithm. It assigns variables to two classes: relevant or non relevant .

Hence the performance of the algorithms can be measured using the same quantities

that are used for estimation of ordinary classifiers. Two measures are particularly

useful for estimation of performance: sensitivity S and positive predictive value

PPV . Sensitivity S is measured as

S

=

TP

/(

TP

+

FN

),

(2.1)

where TP is a number of truly relevant features recognised by an algorithm, FN is a

number of truly relevant features that are not recognised by an algorithm and FP is a

number of non relevant features that are incorrectly recognised as relevant. Positive

predictive value PPV is measured as

PPV

=

TP

/(

TP

+

FP

).

(2.2)

Feature Selection for Data and Pattern Recognition

Search WWH ::

Custom Search

Home