Database Reference
In-Depth Information
D ∩ D )=
D ∩ D ) because
D )=
D ).
5.
Z
(
W
(
Z
(
W
(
D − D )
D − D ) because
6.
Z
(
⊆ W
(
Z
(
D
)
⊆ W
(
D
).
D ∩ D )) is a larger proportion of the error of
7. from 2-6,
E
(
Z → y, Z
(
Z → y
D ∩ D )) of
D is a
than is
E
(
W → y, W
(
W → y
and hence performance on
D − D
larger component of the performance of
Z → y
and performance on
is a larger component of the performance of
W → y
.
However, in most domains of interest the dimensionality of the instance space will
be very high. In consequence, for realistic training and test sets the proportion
of the training set that appears in the test set, |D∩D |
|
, will be small. Hence this
effect will be negligible, as performance on the training set will be a negligible
portion of total performance. What we are more interested in is off-training-
set error. We contend that the force of these hypotheses will be stronger than
accounted for by the difference made by the overlap between training and test
sets, and hence that they do apply to off-training-set error. We note, however,
that it is trivial to construct no-free-lunch proofs, such as those of Wolpert [5]
and Schaffer [6], that this is not, in general, true. Rather, we contend that the
hypotheses will in general be true for 'real-world' learning tasks. We justify
this contention by recourse to the similarity assumption [7], that in the absence
of other information, the greater the similarity between two objects in other
respects, the greater the probability of their both belonging to the same class. We
believe that most machine learning algorithms depend upon this assumption, and
that this assumption is reasonable for real-world knowledge acquisition tasks.
Test set cases covered by a more general but not a more specific rule are likely
to be less similar to training cases covered by both rules than are test set cases
covered by the more specific rule. Hence satisfying the left-hand-side of the more
specific rule provides stronger evidence of likely class membership.
A final point that should be noted is that these hypotheses apply to individual
classification rules — structures that associate an identified region of an instance
space with a single class. However, as will be discussed in more detail below, we
believe that the principle is nonetheless highly relevant to 'complete classifiers,'
such as decision trees, that assign different regions of the instance space to differ-
ent classes. This is because each individual region within a 'complete classifier'
(such as a decision tree leaf) satisfies our definition of a classification rule, and
hence the hypotheses can cast light on the likely consequences of relabeling sub-
regions of the instance space within such a classifier (for example, generalizing
one leaf of a decision tree at the expense of another, as proposed elsewhere [8]).
D
|
2
Evaluation
To evaluate these hypotheses we sought to generate rules of varying generality
but identical empirical evidence (no other evidence source being considered in
the research), and to test the hypotheses' predictions with respect to these rules.
We wished to provide some evaluation both of whether the predicted effects
are general (with respect to rules with the relevant properties selected at random)
 
Search WWH ::




Custom Search