Probabilistic Connection between Cross-Validation and Vapnik Bounds - Agents and Artificial Intelligence

Information Technology Reference

In-Depth Information

since the selection of optimal complexity is based on the guaranteed generalization

risk. The bound for the guaranteed risk is expressed in terms of Vapnik-Chervonenkis

dimension , and is a pessimistic overestimation of the growth function , which in turn

is overestimation of the unknown Vapnik-Chervonenkis entropy . We formally remind

these notions later in the paper. All those overestimations contribute (unfortunately) to

the fact that for a fixed sample size, SRM usually underestimates the optimal complexity

and chooses too simple model.

Results presented in this paper may be regarded as conceptually akin to results

by Holden [11,12], where error bounds on cross-validation and so-called sanity-check

bounds are derived. The sanity-check bound is a proof, for large class of learning algo-

rithms, that the error of the leave-one-out estimate is not much worse — O ( h / I ) —

than the worst-case behavior of the training error estimate, where h stands for Vapnik-

Chervonenkis dimension of given set of functions and I stands for the sample size. The

name sanity-check refers to the fact that although we believe that under many circum-

stances, the leave-one-out estimate will perform better than the training error (and thus

justify its computational expense) the goal of the sanity-check bound is to simply prove

that it is not much worse than the training error [13].

These results were further generalized by Kearns [13,14,15] using the notion of

β 2 ) -error stability 2 rather than (

β 2 ) -hypothesis stability 3 imposed on the learn-

(

β 1 ,

ing algorithm.

For the sake of comparison and to set up the perspective for further reading of this pa-

per, we highlight some differences of meaning of our results and the results mentioned

above:

- we do not focus on how well the measured cross-validation result estimates the

generalization error or how far it is from the training error in the leave-one-out case

— sanity-check bounds [12,13]; instead, we want to make statements about the

cross-validation result without actually measuring it , thus, remaining in the setting

of the SRM framework.

- in particular we want to state probabilistically what

-difference one can expect

between the known Vapnik bound and the unknown cross-validation result for given

conditions of the experiment,

- in the consequence, we want to be able to calculate the necessary size of the train-

ing sample, so that the

is sufficiently small, and so that the optimal complexity

indicated via SRM is acceptable in the sense that cross-validation, if performed,

would probably indicate the same complexity; this statement may seem related to

the notion of sample complexity considered e.g. by Bartlett [16,17] or Ng [18], but

we do not find the sample size required for the algorithm to learn/generalize “well”

We say that a learning algorithm has a ( β 1 , β 2 ) -error stability, if generalization errors for two

models provided by this algorithm using respectively a training sample of size I and a sample

with size lowered to I −

− β 2 . Obviously

1are

β 1 -close to each other with probability at least 1

the smaller both β 1 , β 2 are the more stable the algorithm.

We say that a learning algorithm has a ( β 1 , β 2 ) -hypothesis stability, if the two models provided

by this algorithm using respectively a training sample of size I and sample with size lowered

to I − 1areβ 1 -close to each other with probability at least 1 − β 2 , where closeness of models

is measured by some functional metrics, e.g. L 1 , L 2 ,etc.

Agents and Artificial Intelligence

Search WWH ::

Custom Search

Home