Information Technology Reference
In-Depth Information
harder, should be easier when all relevant variables are identified and hence the
number of variables is reduced.
For example in our toy problem a perfect algorithm for all-relevant feature selec-
tion should find that 10 variables out of 1,000 are somehow connected with decision
variable, therefore the number of models tested in the brute force approach can be
reduced to 45. In a medical problem of a researcher studying a connection between
gene expression and a medical condition, a number of genes to consider may be
reduced from multiple thousands to hundreds or tens, or maybe even a handful of
variables. A domain specific knowledge can be then applied to build a model of a
problem under scrutiny.
2.1.1 Definitions
Up to this point the notion of relevance was used without definition, instead we relied
on its intuitive understanding. However, it has been already observed by Kohavi and
John [ 7 ] that there are several definitions of relevance that may be contradictory
and misleading. They proposed that two degrees of relevance (strong and weak) are
required to encompass all notions that are usually associated with this term. In their
approach the relevance is defined in the absolute terms, with the help of an ideal
Bayes classifier.
Definition 1 A feature X is strongly relevant when removal of X alone from the data
always results in deterioration of the prediction accuracy of the ideal Bayes classifier.
Definition 2 A feature X is weakly relevant if it is not strongly relevant and there
exists a subset of features S , such that the performance of ideal Bayes classifier on S
is worse than the performance on S
∪{
X
}
.
Definition 3 A feature X is irrelevant if it is neither strongly nor weakly relevant.
One should note, that an information system might be constructed in such a way,
that there are no strongly relevant attributes. Indeed, it is easy to notice that the toy
system described above does not contain strongly relevant attributes.
Another useful notions were introduced by Nilson et al. [ 13 ], who used concepts
of weakly and strongly relevant features to define formally two problems of feature
selection. A minimal optimal problem in feature selection has the goal to find the
minimal set of attributes giving the best possible classifier. The other is an all relevant
problem , where one is interested in finding all strongly and weakly relevant attributes.
Definition 4 ( Minimal optimal problem ) Find a set of attributes consisting of all
strongly relevant attributes and such subset of weakly relevant attributes, that all
remaining weakly relevant attributes contain only redundant information.
Definition 5 ( All-relevant problem ) Find all strongly relevant and all weakly rele-
vant attributes.
 
Search WWH ::




Custom Search