Generality Is Predictive of Prediction Accuracy - Data Mining: Theory, Methodology, Techniques, and Applications

Database Reference

In-Depth Information

D ∩ D )=

D ∩ D ) because

D )=

D ).

(

D − D )

D − D ) because

(

⊆ W

(

)

⊆ W

(

D ∩ D )) is a larger proportion of the error of

7. from 2-6,

(

Z → y, Z

(

Z → y

D ∩ D )) of

D is a

than is

(

W → y, W

(

W → y

and hence performance on

D − D

larger component of the performance of

Z → y

and performance on

is a larger component of the performance of

W → y

However, in most domains of interest the dimensionality of the instance space will

be very high. In consequence, for realistic training and test sets the proportion

of the training set that appears in the test set, |D∩D |

, will be small. Hence this

effect will be negligible, as performance on the training set will be a negligible

portion of total performance. What we are more interested in is off-training-

set error. We contend that the force of these hypotheses will be stronger than

accounted for by the difference made by the overlap between training and test

sets, and hence that they do apply to off-training-set error. We note, however,

that it is trivial to construct no-free-lunch proofs, such as those of Wolpert [5]

and Schaffer [6], that this is not, in general, true. Rather, we contend that the

hypotheses will in general be true for 'real-world' learning tasks. We justify

this contention by recourse to the similarity assumption [7], that in the absence

of other information, the greater the similarity between two objects in other

respects, the greater the probability of their both belonging to the same class. We

believe that most machine learning algorithms depend upon this assumption, and

that this assumption is reasonable for real-world knowledge acquisition tasks.

Test set cases covered by a more general but not a more specific rule are likely

to be less similar to training cases covered by both rules than are test set cases

covered by the more specific rule. Hence satisfying the left-hand-side of the more

specific rule provides stronger evidence of likely class membership.

A final point that should be noted is that these hypotheses apply to individual

classification rules — structures that associate an identified region of an instance

space with a single class. However, as will be discussed in more detail below, we

believe that the principle is nonetheless highly relevant to 'complete classifiers,'

such as decision trees, that assign different regions of the instance space to differ-

ent classes. This is because each individual region within a 'complete classifier'

(such as a decision tree leaf) satisfies our definition of a classification rule, and

hence the hypotheses can cast light on the likely consequences of relabeling sub-

regions of the instance space within such a classifier (for example, generalizing

one leaf of a decision tree at the expense of another, as proposed elsewhere [8]).

Evaluation

To evaluate these hypotheses we sought to generate rules of varying generality

but identical empirical evidence (no other evidence source being considered in

the research), and to test the hypotheses' predictions with respect to these rules.

We wished to provide some evaluation both of whether the predicted effects

are general (with respect to rules with the relevant properties selected at random)

Data Mining: Theory, Methodology, Techniques, and Applications

Search WWH ::

Custom Search

Home