Information Technology Reference
In-Depth Information
one feature till we have the entire set. Going from bottom to the top at each step one
variable is eliminated and at the end of this search path the set becomes empty. The
former algorithm is called sequential forward selection, while the latter sequential
backward reduction or elimination (or selection). A variation of the two approaches
requires commencing with some non-empty set, and adding to it as well as reducing.
Whichever starting point is selected, we need to decide on a search direction,
forward or backward, and some limitations that are imposed on the procedure. Instead
of checking all available options, more popular and realistic approach is to apply
some greedy methodology, where feature selection is executed stepwise—at each
stage evaluation of a considered subset of attributes is based on this local context,
and addition or removal of features depends on the fact whether this action results
in increased performance. With this kind of processing if we conduct it from the
beginning to the end without introducing any other stopping criteria the number of
tested subsets equals to:
N
1
) = (
N
1
)
N
N
+ (
N
1
) + (
N
2
) +···+
2
+
1
=
0 (
N
i
(5.1)
2
i
=
which is in amoremanageable formof a polynomial than the exponential relationship
given before.
The condition of increased performance uses the concept of relevance by
incremental usefulness for attributes. This requirement could be considered as too
strong, especially in case of backward elimination. It could be argued that if the pre-
dictive accuracy is the same regardless of presence or absence of some variable in the
considered subset, then this variable is irrelevant for the task and can be disregarded,
thus making the condition weaker.
When subsets of features are evaluated using the quality of prediction in a direct
manner, it means employing the wrapper approach [ 18 ]. Another alternative is to use
some measures, separate and independent from the system responsible for discrim-
inating classes present in the training data, for example exploiting elements from
information theory such as information gain, entropy, consistency [ 9 ].
Sequential selection procedures, whether forward or backward, executed in the
wrapper mode explicitly return the information on how useful individual attributes
are for the employed inducer, show how it prefers some variable over others, which
can be interpreted as a scoring function assigning specific weights and organising
features into specific ordering, which is in fact their ranking. In forward selection
the first to be selected are the most important variables, in backward reduction the
least important features are the first to be discarded. This importance of attributes is
always considered in the local context, from the current perspective of the confines
of the search path.
Search WWH ::




Custom Search