Information Technology Reference
In-Depth Information
It has been shown by Nilsson and co-workers, that exact solution of the all rele-
vant problem requires an exhaustive search, which is intractable for all but smallest
systems.
The relevance defined earlier is a qualitative notion—a feature can either be rel-
evant or irrelevant. It is also an objective property of the system under scrutiny,
independent from the classifier used for building a model. This notion is distinct
from importance of variable , that is a quantitative and classifier-dependent measure
of the contribution of a variable to a model of the system. One can use various mea-
sures of importance of variable, provided that they satisfy the simple condition—the
importance of relevant variables should be higher than importance of irrelevant ones.
A useful and intuitive measure of importance was introduced by Breiman in random
forest (RF) classification algorithm [ 2 ].
Definition 6 ( Importance of a variable ) is the loss of the classification accuracy of
the model that was built using this variable, when the information on the variable's
value is withdrawn.
A final concept that will be used often enough in the current chapter to deserve a
mention in this section is a contrast variable .
Definition 7 ( Contrast variable ) is such descriptive variable that does not carry
information on the decision variable by design.
It is added to the system in order to discern relevant and irrelevant variables. It may be
obtained by drawing from theoretically justified probability distribution e.g. normal
or uniform; it may be also obtained from real variables by random permutation of
their values between objects. Application of contrast variables for feature selection
was first proposed by Stoppiglia et al. [ 15 ] and then independently by Tuv et al. [ 17 ],
and Rudnicki et al. [ 14 ].
Onemay notice, that any all-relevant feature selection algorithm is a special type of
classification algorithm. It assigns variables to two classes: relevant or non relevant .
Hence the performance of the algorithms can be measured using the same quantities
that are used for estimation of ordinary classifiers. Two measures are particularly
useful for estimation of performance: sensitivity S and positive predictive value
PPV . Sensitivity S is measured as
S
=
TP
/(
TP
+
FN
),
(2.1)
where TP is a number of truly relevant features recognised by an algorithm, FN is a
number of truly relevant features that are not recognised by an algorithm and FP is a
number of non relevant features that are incorrectly recognised as relevant. Positive
predictive value PPV is measured as
PPV
=
TP
/(
TP
+
FP
).
(2.2)
Search WWH ::




Custom Search