Information Technology Reference
In-Depth Information
attribute another one was added and 24 two attribute subsets were prepared and rules
for them generated. Again selection of the best algorithm ended the processing at
this stage. The analogous procedure was executed at all stages that followed.
At the first and all subsequent stages there were generated four algorithms for
each considered subset: two with the conditional attribute of cost (decreasing) type,
and two for gain (increasing) type, for both minimal cover and all rules on examples
algorithm. From inferred algorithms all possible and approximate rules were next
excluded, then an algorithm was tested with respect to the maximal predictive accu-
racy. To this aim for all algorithms there were introduced hard constraints on rules
with respect to minimal support required of rules to be taken under consideration in
classification. In most cases these requirements resulted in increased performance.
The details of conducted experiments are listed in Table 5.1 .
As ambiguous cases of no rule matching the testing samples or contradicting
decisions were always treated as incorrect, the performance of these rule classifiers
in the initial phase, when there are only few considered features, is rather poor.
However, it increases quickly and gradually with each added attribute. For just few
conditional attributes from which rules are inferred, the two types of algorithms,
minimal cover and all rules on examples, are not that different, with similar numbers
of constituent rules and close performance level. Once there are more features the
differences are more distinct.
The first local maximum is detected for the subset of just five attributes, for which
all rules on examples algorithm limited by rejecting rules with support lower than 7
classifies correctly 91.67% of samples. The best performance for six variables for
this type of algorithm is lower, 88.33%. Yet for the same subset the minimal cover
algorithm has predictive accuracy of 91.67%, which is kept at the same level also for
seven features before it decreases to 83.33% for eight attributes. The performance of
all best rule classifiers at each stage in shown in Fig. 5.2 for both minimal cover and
all rules on examples decision algorithms, denoted as MCDA and FDA respectively.
In forward selection approach with each iterative step of the procedure we deal
with more and more variables and at each step we can ask the question whether it
is enough, whether we have the set of features that satisfy our requirements. The
answer is not straightforward. Even when the predictive accuracy is considered as
the most important factor on which the decision is based, it is not a simple task of
reaching some maximum, as upon finding it we cannot possibly know if this is of
local character or global, and after some decreased performance for another subset
in the search path we can encounter another local maximum. We know what the
maximum is only when all possible subsets of attributes are tested (all possible on
the selected search path, which is not exhaustive), that is including the entire set of
available variables.
When we can afford the extended processing of search procedures executed with-
out additional stopping criteria, the observed performance for subsets of variables,
with gradually increasing cardinalities, can be used as means to feature weighting
and ranking, to be employed for another inducer as a kind of filter. Or, we can fin-
ish the variable selection procedure by choosing such subset of features for which
the classification accuracy was the highest when compared to all tested alternatives.
Search WWH ::




Custom Search