Information Technology Reference
In-Depth Information
Rough set processing possesses an inherent mechanism for dimensionality reduc-
tion in the concept of relative reducts [ 27 ]. Relative reducts are such subsets of
attributes which offer the same predictive accuracy as the entire set of attributes for
the considered samples. If a reduct is activated, some of variables are excluded from
the rule induction phase. If the intersection of all reducts, called the core , is non-
empty, it includes all features that are necessary for classification, yet they are not
necessarily sufficient. Also it often happens that there are many reducts and no indi-
cators as to which one should be activated [ 26 ]. Reducts can be used indirectly, as a
source of additional information on individual attributes, reflecting their importance
for a task [ 36 , 41 ].
Predictive accuracy of a rule classifier depends not only on the input data basing on
which the constituent decision rules are inferred, but also, in the very high degree, on
a selected approach to rule induction [ 5 ]. Possibly the quickest (yet not the simplest)
is induction of a minimal cover—there is found only such small number of rules that
are sufficient to classify correctly all learning samples. However, rules inferred with
this approach are not necessarily the best. Taking under consideration for example
a value of rule support, which is a parameter stating for how many training samples
a rule is valid, it may turn out that other algorithms for rule induction can find
some more interesting rules [ 34 ]. Generation of all rules on examples is the opposite
approach to minimal cover and enables calculation of good, bad, and average rules,
but at the cost of higher computational complexity and extended processing. If it can
be afforded, induction of all rules and their analysis enables to tailor the decision
algorithm to specific requirements [ 37 , 38 ]. Once a set of rules is induced, we can
filter some elements using quality measures.
Calculation of all rules on examples for sequential backward elimination of
variables even for relatively small their number is a task of unmanageable propor-
tions.
When the number of attributes is low, inferring rules takes distinctively less time
which allows to employ sequential forward selection procedure. However, the dif-
ferences in performance for algorithms found in initial stages can be so small that
to choose the best one not only its predictive accuracy is taken into account but also
other parameters, for example the number of rules in the algorithm and their type.
The exact (certain) rules are the most useful for classification as they classify unam-
biguously. Possible and approximate rules point to possible inclusion in some class
or a union of classes which do not help in increasing recognition without any further
processing.
Classification results for rule classifiers are given in three groups of decisions:
correct, incorrect, and ambiguous. The last of these is dedicated for cases when there
are several rules with contradicting verdicts or no rules matching. In situation of
contradicting verdicts the popular attitude is to execute some kind of voting, either
by simple majority or with weighting of rules, for example by their support as it can
be argued that rules with higher support can be considered as more important [ 39 ].
Search WWH ::




Custom Search