Database Reference
In-Depth Information
Generality Is Predictive of Prediction Accuracy
Geoffrey I. Webb 1 and Damien Brain 2
1
Faculty of Information Technology,
Monash University, Clayton, Vic 3800, Australia
webb@infotech.monash.edu.au
2
UTelco Systems,
Level 50/120 Collins St Melbourne, Vic 3001, Australia
damien.brain@utelcosystems.com.au
Abstract. During knowledge acquisition it frequently occurs that mul-
tiple alternative potential rules all appear equally credible. This paper
addresses the dearth of formal analysis about how to select between
such alternatives. It presents two hypotheses about the expected impact
of selecting between classification rules of differing levels of generality in
the absence of other evidence about their likely relative performance on
unseen data. We argue that the accuracy on unseen data of the more
general rule will tend to be closer to that of a default rule for the class
than will that of the more specific rule. We also argue that in comparison
to the more general rule, the accuracy of the more specific rule on unseen
cases will tend to be closer to the accuracy obtained on training data.
Experimental evidence is provided in support of these hypotheses. These
hypotheses can be useful for selecting between rules in order to achieve
specific knowledge acquisition objectives.
1
Introduction
In many knowledge acquisition contexts there will be many classification rules
that perform equally well on the training data. For example, as illustrated by
the version space [1], there will often be alternative rules of differing degrees
of generality all of which agree with the training data. However, even when we
move away from a situation in which we are expecting to find rules that are
strictly consistent with the training data, in other words, when we allow rules to
misclassify some training cases, there will often be many rules all of which cover
exactly the same training cases. If we are selecting rules to use for some decision
making task, we must select between such rules with identical performance on the
training data. To do so requires a learning bias [2], a means of selecting between
competing hypotheses that utilizes criteria beyond those strictly encapsulated
in the training data.
All learning algorithms confront this problem. This is starkly illustrated by
the large numbers of rules with very high values for any given interestingness
measure that are typically discovered during association rule discovery. Many
systems that learn rule sets for the purpose of prediction mask this problem
by making arbitrary choices between rules with equivalent performance on the
Search WWH ::




Custom Search