Information Technology Reference
In-Depth Information
since the selection of optimal complexity is based on the guaranteed generalization
risk. The bound for the guaranteed risk is expressed in terms of Vapnik-Chervonenkis
dimension , and is a pessimistic overestimation of the growth function , which in turn
is overestimation of the unknown Vapnik-Chervonenkis entropy . We formally remind
these notions later in the paper. All those overestimations contribute (unfortunately) to
the fact that for a fixed sample size, SRM usually underestimates the optimal complexity
and chooses too simple model.
Results presented in this paper may be regarded as conceptually akin to results
by Holden [11,12], where error bounds on cross-validation and so-called sanity-check
bounds are derived. The sanity-check bound is a proof, for large class of learning algo-
rithms, that the error of the leave-one-out estimate is not much worse — O ( h / I )
than the worst-case behavior of the training error estimate, where h stands for Vapnik-
Chervonenkis dimension of given set of functions and I stands for the sample size. The
name sanity-check refers to the fact that although we believe that under many circum-
stances, the leave-one-out estimate will perform better than the training error (and thus
justify its computational expense) the goal of the sanity-check bound is to simply prove
that it is not much worse than the training error [13].
These results were further generalized by Kearns [13,14,15] using the notion of
β 2 ) -error stability 2 rather than (
β 2 ) -hypothesis stability 3 imposed on the learn-
(
β 1 ,
β 1 ,
ing algorithm.
For the sake of comparison and to set up the perspective for further reading of this pa-
per, we highlight some differences of meaning of our results and the results mentioned
above:
- we do not focus on how well the measured cross-validation result estimates the
generalization error or how far it is from the training error in the leave-one-out case
— sanity-check bounds [12,13]; instead, we want to make statements about the
cross-validation result without actually measuring it , thus, remaining in the setting
of the SRM framework.
- in particular we want to state probabilistically what
-difference one can expect
between the known Vapnik bound and the unknown cross-validation result for given
conditions of the experiment,
- in the consequence, we want to be able to calculate the necessary size of the train-
ing sample, so that the
ε
is sufficiently small, and so that the optimal complexity
indicated via SRM is acceptable in the sense that cross-validation, if performed,
would probably indicate the same complexity; this statement may seem related to
the notion of sample complexity considered e.g. by Bartlett [16,17] or Ng [18], but
we do not find the sample size required for the algorithm to learn/generalize “well”
ε
2
We say that a learning algorithm has a ( β 1 , β 2 ) -error stability, if generalization errors for two
models provided by this algorithm using respectively a training sample of size I and a sample
with size lowered to I
β 2 . Obviously
1are
β 1 -close to each other with probability at least 1
the smaller both β 1 , β 2 are the more stable the algorithm.
3
We say that a learning algorithm has a ( β 1 , β 2 ) -hypothesis stability, if the two models provided
by this algorithm using respectively a training sample of size I and sample with size lowered
to I 1areβ 1 -close to each other with probability at least 1 β 2 , where closeness of models
is measured by some functional metrics, e.g. L 1 , L 2 ,etc.
Search WWH ::




Custom Search