Weighting of Features by Sequential Selection - Feature Selection for Data and Pattern Recognition

Information Technology Reference

In-Depth Information

one feature till we have the entire set. Going from bottom to the top at each step one

variable is eliminated and at the end of this search path the set becomes empty. The

former algorithm is called sequential forward selection, while the latter sequential

backward reduction or elimination (or selection). A variation of the two approaches

requires commencing with some non-empty set, and adding to it as well as reducing.

Whichever starting point is selected, we need to decide on a search direction,

forward or backward, and some limitations that are imposed on the procedure. Instead

of checking all available options, more popular and realistic approach is to apply

some greedy methodology, where feature selection is executed stepwise—at each

stage evaluation of a considered subset of attributes is based on this local context,

and addition or removal of features depends on the fact whether this action results

in increased performance. With this kind of processing if we conduct it from the

beginning to the end without introducing any other stopping criteria the number of

tested subsets equals to:

N

−

1

) = (

N

−

1

)

N

+ (

N

−

1

) + (

N

−

2

) +···+

2

+

1

=

0 (

N

−

i

(5.1)

2

i

=

which is in amoremanageable formof a polynomial than the exponential relationship

given before.

The condition of increased performance uses the concept of relevance by

incremental usefulness for attributes. This requirement could be considered as too

strong, especially in case of backward elimination. It could be argued that if the pre-

dictive accuracy is the same regardless of presence or absence of some variable in the

considered subset, then this variable is irrelevant for the task and can be disregarded,

thus making the condition weaker.

When subsets of features are evaluated using the quality of prediction in a direct

manner, it means employing the wrapper approach [ 18 ]. Another alternative is to use

some measures, separate and independent from the system responsible for discrim-

inating classes present in the training data, for example exploiting elements from

information theory such as information gain, entropy, consistency [ 9 ].

Sequential selection procedures, whether forward or backward, executed in the

wrapper mode explicitly return the information on how useful individual attributes

are for the employed inducer, show how it prefers some variable over others, which

can be interpreted as a scoring function assigning specific weights and organising

features into specific ordering, which is in fact their ranking. In forward selection

the first to be selected are the most important variables, in backward reduction the

least important features are the first to be discarded. This importance of attributes is

always considered in the local context, from the current perspective of the confines

of the search path.

Feature Selection for Data and Pattern Recognition

Search WWH ::

Custom Search

Home