Generality Is Predictive of Prediction Accuracy - Data Mining: Theory, Methodology, Techniques, and Applications

Database Reference

In-Depth Information

Generality Is Predictive of Prediction Accuracy

Geoffrey I. Webb 1 and Damien Brain 2

1

Faculty of Information Technology,

Monash University, Clayton, Vic 3800, Australia

webb@infotech.monash.edu.au

2

UTelco Systems,

Level 50/120 Collins St Melbourne, Vic 3001, Australia

damien.brain@utelcosystems.com.au

Abstract. During knowledge acquisition it frequently occurs that mul-

tiple alternative potential rules all appear equally credible. This paper

addresses the dearth of formal analysis about how to select between

such alternatives. It presents two hypotheses about the expected impact

of selecting between classification rules of differing levels of generality in

the absence of other evidence about their likely relative performance on

unseen data. We argue that the accuracy on unseen data of the more

general rule will tend to be closer to that of a default rule for the class

than will that of the more specific rule. We also argue that in comparison

to the more general rule, the accuracy of the more specific rule on unseen

cases will tend to be closer to the accuracy obtained on training data.

Experimental evidence is provided in support of these hypotheses. These

hypotheses can be useful for selecting between rules in order to achieve

specific knowledge acquisition objectives.

1

Introduction

In many knowledge acquisition contexts there will be many classification rules

that perform equally well on the training data. For example, as illustrated by

the version space [1], there will often be alternative rules of differing degrees

of generality all of which agree with the training data. However, even when we

move away from a situation in which we are expecting to find rules that are

strictly consistent with the training data, in other words, when we allow rules to

misclassify some training cases, there will often be many rules all of which cover

exactly the same training cases. If we are selecting rules to use for some decision

making task, we must select between such rules with identical performance on the

training data. To do so requires a learning bias [2], a means of selecting between

competing hypotheses that utilizes criteria beyond those strictly encapsulated

in the training data.

All learning algorithms confront this problem. This is starkly illustrated by

the large numbers of rules with very high values for any given interestingness

measure that are typically discovered during association rule discovery. Many

systems that learn rule sets for the purpose of prediction mask this problem

by making arbitrary choices between rules with equivalent performance on the

Search WWH ::

Custom Search

Home