Information Technology Reference
In-Depth Information
For ANN classifiers it is more natural to apply backward elimination, because
networks with excessive numbers of inputs still can perform better than those with
insufficient features. In the training phase networks can detect just by themselves
which input variables are less important and assign to them low weights of intercon-
nections, minimising their influence on the outcome. On the other hand, when there
are not enough of characteristic features, the network can try and generalise, yet the
conclusions cannot be drawn from nothing. As a result neural networks with only
few inputs typically need more time to be trained, can have trouble converging and
then generalising for unknown samples.
Induction of decision rules takes significantly less time for fewer attributes. How-
ever, they do not necessarily contain information required to infer rules with good
quality, resulting in high predictive accuracy. Applying forward selection procedures
we can not only choose the attributes that are the most beneficial to rule induction
process, but also adjust their preference orders which further increases performance.
Typically minimal cover decision algorithms give worse results than rule classifiers
constructed in different approaches, for example all rules on examples with some
hard constrains on constituent rules such as minimal support required. Yet inferring
all rules when there are many attributes requires a lot of computations and takes
time. Since in subsequent stages of backward elimination many of generated rules
would be the same, as the studied subsets of features are overlapping, we can employ
another methodology, inwhich backward reduction is in fact applied to rules referring
to rejected features.
For all search paths tested one of the important elements to consider is the stopping
criterion, answering the question when or where the selection procedures should end.
The response is not trivial as it depends to a high degree on the purpose of applying
the search procedures in the first place. When the goal is just to find a good subset
of features, that is resulting in an induced solution with satisfyingly high predictive
accuracy, we can stop the search once we detect a maximum in correct recognition
ratio. However, if we do it to quickly, before checking alternative subsets, it may
turn out that a maximum is only local and not global, and for some other candidate
subset of variables the predictive accuracy is better.
If extended processing is acceptable, or with the goal of weighting available
variables we test all possible subsets of variables in a search path. We do observe
the performance (after all the choice is conditioned by it), but we also study the
order in which all features are organised. This order reflects their weighting from
the perspective of applied search procedure and inducer employed. As classifiers
have different characteristics and the selection of variables is wrapped around their
performance, the same search direction applied for another classification system,
with distinctively different properties may return completely different ranking of
attributes. From all validated subsets we can choose the best, or we can impose the
obtained ranking of features on a separate classification process and test its usefulness
as a filter.
All attribute selection procedures were illustrated for a binary classification task
with balanced data, for the problem of authorship attribution from stylometric
processing of texts. The most important aim of textual analysis is to find definitions
Search WWH ::




Custom Search