Weighting of Features by Sequential Selection - Feature Selection for Data and Pattern Recognition - page 82

Information Technology Reference

In-Depth Information

Fig. 5.3 ANN classification accuracy observed in sequential backward elimination process, in

relation to the number of considered features, for each average there is indicated maximal and

minimal performance

on examples, the full algorithm with hard constraints on minimal rule support gives

much better results of 76.67%. Yet this full algorithm contains over 46 thousands of

constituent decision rules and calculation takes a noticeable amount of time. Genera-

tion of this type of algorithm at all stages of backward elimination of features would

be far too much time-consuming. Instead, another approach is employed, focused on

the previously inferred all rules on examples algorithm for the entire set of attributes.

Decision rules included in the full algorithm have varying lengths equal to

the numbers of conditions in the premise part, varying supports, refer to various

attributes. If we were to employ backward elimination procedures and attempted to

induce all rules on examples algorithms for subsets of variables in subsequent reduc-

tion stages it is reasonable to expect that at least a part of newly inferred rules would

be the same as those already found. Therefore, rather than waste time on such simply

repetitive computations we can apply the backward selection to the full algorithm

itself in the following manner.

In the first step we disregard all rules with conditions on a specific variable, itera-

tively for all variables. As a result we obtain 25 different reduced decision algorithms,

which are then tested and the one with the highest classification accuracy is selected.

We reject the attribute, elimination of which have resulted in this algorithm, and use

this limited set of decision rules as the input to the second stage of processing, where

24 reduced algorithms are tested, and so on, with reduction of one feature and all

rules referring to it at each stage. The process can continue till the point of detecting

some significantly worse performance, rejecting all rules from the algorithm (even

though there are still some features left to be considered), or reducing all conditional

attributes. With this last stopping criterion there is obtained weighting of variables

which can be used for other purposes. The procedure of discarding rules governed

by included attributes can be perceived as rule filtering.

The detailed results from execution of described methodology are shown in

Table 5.3 , in which the ranking of features is given in the (h) column.

Next Page

Feature Selection for Data and Pattern Recognition

Search WWH ::

Custom Search

Home