Weighting of Features by Sequential Selection - Feature Selection for Data and Pattern Recognition

Information Technology Reference

In-Depth Information

Rough set processing possesses an inherent mechanism for dimensionality reduc-

tion in the concept of relative reducts [ 27 ]. Relative reducts are such subsets of

attributes which offer the same predictive accuracy as the entire set of attributes for

the considered samples. If a reduct is activated, some of variables are excluded from

the rule induction phase. If the intersection of all reducts, called the core , is non-

empty, it includes all features that are necessary for classification, yet they are not

necessarily sufficient. Also it often happens that there are many reducts and no indi-

cators as to which one should be activated [ 26 ]. Reducts can be used indirectly, as a

source of additional information on individual attributes, reflecting their importance

for a task [ 36 , 41 ].

Predictive accuracy of a rule classifier depends not only on the input data basing on

which the constituent decision rules are inferred, but also, in the very high degree, on

a selected approach to rule induction [ 5 ]. Possibly the quickest (yet not the simplest)

is induction of a minimal cover—there is found only such small number of rules that

are sufficient to classify correctly all learning samples. However, rules inferred with

this approach are not necessarily the best. Taking under consideration for example

a value of rule support, which is a parameter stating for how many training samples

a rule is valid, it may turn out that other algorithms for rule induction can find

some more interesting rules [ 34 ]. Generation of all rules on examples is the opposite

approach to minimal cover and enables calculation of good, bad, and average rules,

but at the cost of higher computational complexity and extended processing. If it can

be afforded, induction of all rules and their analysis enables to tailor the decision

algorithm to specific requirements [ 37 , 38 ]. Once a set of rules is induced, we can

filter some elements using quality measures.

Calculation of all rules on examples for sequential backward elimination of

variables even for relatively small their number is a task of unmanageable propor-

tions.

When the number of attributes is low, inferring rules takes distinctively less time

which allows to employ sequential forward selection procedure. However, the dif-

ferences in performance for algorithms found in initial stages can be so small that

to choose the best one not only its predictive accuracy is taken into account but also

other parameters, for example the number of rules in the algorithm and their type.

The exact (certain) rules are the most useful for classification as they classify unam-

biguously. Possible and approximate rules point to possible inclusion in some class

or a union of classes which do not help in increasing recognition without any further

processing.

Classification results for rule classifiers are given in three groups of decisions:

correct, incorrect, and ambiguous. The last of these is dedicated for cases when there

are several rules with contradicting verdicts or no rules matching. In situation of

contradicting verdicts the popular attitude is to execute some kind of voting, either

by simple majority or with weighting of rules, for example by their support as it can

be argued that rules with higher support can be considered as more important [ 39 ].

Search WWH ::

Custom Search

Home