All Relevant Feature Selection Methods and Applications - Feature Selection for Data and Pattern Recognition

Information Technology Reference

In-Depth Information

harder, should be easier when all relevant variables are identified and hence the

number of variables is reduced.

For example in our toy problem a perfect algorithm for all-relevant feature selec-

tion should find that 10 variables out of 1,000 are somehow connected with decision

variable, therefore the number of models tested in the brute force approach can be

reduced to 45. In a medical problem of a researcher studying a connection between

gene expression and a medical condition, a number of genes to consider may be

reduced from multiple thousands to hundreds or tens, or maybe even a handful of

variables. A domain specific knowledge can be then applied to build a model of a

problem under scrutiny.

2.1.1 Definitions

Up to this point the notion of relevance was used without definition, instead we relied

on its intuitive understanding. However, it has been already observed by Kohavi and

John [ 7 ] that there are several definitions of relevance that may be contradictory

and misleading. They proposed that two degrees of relevance (strong and weak) are

required to encompass all notions that are usually associated with this term. In their

approach the relevance is defined in the absolute terms, with the help of an ideal

Bayes classifier.

Definition 1 A feature X is strongly relevant when removal of X alone from the data

always results in deterioration of the prediction accuracy of the ideal Bayes classifier.

Definition 2 A feature X is weakly relevant if it is not strongly relevant and there

exists a subset of features S , such that the performance of ideal Bayes classifier on S

is worse than the performance on S

∪{

X

}

.

Definition 3 A feature X is irrelevant if it is neither strongly nor weakly relevant.

One should note, that an information system might be constructed in such a way,

that there are no strongly relevant attributes. Indeed, it is easy to notice that the toy

system described above does not contain strongly relevant attributes.

Another useful notions were introduced by Nilson et al. [ 13 ], who used concepts

of weakly and strongly relevant features to define formally two problems of feature

selection. A minimal optimal problem in feature selection has the goal to find the

minimal set of attributes giving the best possible classifier. The other is an all relevant

problem , where one is interested in finding all strongly and weakly relevant attributes.

Definition 4 ( Minimal optimal problem ) Find a set of attributes consisting of all

strongly relevant attributes and such subset of weakly relevant attributes, that all

remaining weakly relevant attributes contain only redundant information.

Definition 5 ( All-relevant problem ) Find all strongly relevant and all weakly rele-

vant attributes.

Search WWH ::

Custom Search

Home