Generality Is Predictive of Prediction Accuracy - Data Mining: Theory, Methodology, Techniques, and Applications

Database Reference

In-Depth Information

While our experiments have been performed in a machine learning context,

the results are applicable in wider knowledge acquisition contexts. For example,

interactive knowledge acquisition environments [3, 13] present users with alter-

native rules all of which perform equally well on example data. Where the user

is unable to bring external knowledge to bear to make an informed judgement

about the relative merits of those rules, the system is able to offer no further

advice. Our experiments suggest that relative generality is a factor that an in-

teractive knowledge acquisition system might profitably utilize.

Our experiments also demonstrate that the effect that we discuss is one that

applies frequently in real-world knowledge acquisition tasks. The alternative

rules used in our experiments were all rules of varying levels of generality that

covered exactly the same training instances. In other words, it was not possi-

ble to distinguish between these rules using traditional measures of rule quality

based on performance on a training set, such as information measures. The

only exception was the data sets for which the rules at differing levels of gen-

erality were all identical. In all such cases the results were excluded from the

win/draw/loss record reported in Tables 3 to 5. Hence the sum of the values

in each win/draw/loss record places a lower bound on the number of data sets

for which there were variants of the initial rule all of which covered the same

training instances. Thus, for at least 47 out of 50 data sets, there are variants of

the C4.5rules rule with the greatest cover that cover exactly the same training

cases. For at least 38 out of 50 data sets, there are variants of the first rule

generated by C4.5rules that cover exactly the same training cases. This effect is

not a hypothetical abstraction, it is a frequent occurrence of immediate practical

import.

In such circumstances, when it is necessary to select between alternative rules

with equal performance on the training data, one approach has been to select

the least complex rule [14]. However, some recent authors have argued that

complexity is not an effective rule quality metric [8, 15]. We argue here that

generality provides an alternative criterion on which to select between such rules,

one that allows for reasoning about the trade-offs inherent in the choice of one

rule over the other, rather than providing a blanket prescription.

5

On the Diculty of Measuring Degree of Generalization

It might be tempting to believe that our hypotheses could be extended by in-

troducing a measure of magnitude of generalization together with predictions

about the magnitude of the effects on prediction accuracy that may be expected

from generalizations of different magnitude.

However, we believe that it is not feasible to develop meaningful measures of

magnitude of generalization suitable for such a purpose. Consider, for example,

the possibility of generalizing a rule with conditions

income <

50000 by deleting either condition. Which is the greater generalization? It might

be thought that the greater generalization is the one that covers the greater

number of cases. However, if one rule covers more cases than another then there

age <

40 and

Data Mining: Theory, Methodology, Techniques, and Applications

Search WWH ::

Custom Search

Home