Information Technology Reference
In-Depth Information
Probabilistic Connection between Cross-Validation
and Vapnik Bounds
Przemysław Klesk
Department of Methods of Artificial Intelligence and Applied Mathematics
Westpomeranian University of Technology
ul. Zolnierska 49, Szczecin, Poland
pklesk@wi.zut.edu.pl
Abstract. In the paper we analyze a connection between outcomes of the cross-
validation procedure and Vapnik bounds [1,2] on generalization of learning
machines. We do not focus on how well the measured cross-validation outcome
estimates the generalization error or how far it is from the training error; instead,
we want to make statements about the cross-validation result without actually
measuring it. In particular we want to state probabilistically what
-difference one
can expect between the known Vapnik bound and the unknown cross-validation
result for given conditions of the experiment. In the consequence, we are able to
calculate the necessary size of the training sample, so that the ε is sufficiently
small; and so that the optimal complexity indicated via SRM is acceptable in
the sense that cross-validation, if performed, would probably indicate the same
complexity. We consider a non-stratified variant of cross-validation, which is con-
venient for the main theorem.
ε
Keywords. Statistical learning theory, Bounds on generalization, Cross valida-
tion, Empirical risk minimization, Structural risk minimization, Vapnik-Chervon-
enkis dimension.
1
Introduction and Notation
One part of the Statistical Learning Theory developed by Vapnik [1,2,3] is the theory
of bounds . It provides probabilistic bounds on generalization of learning machines. The
key mathematical tools applied to derive the bounds in their additive versions are Cher-
noff and Hoeffding inequalities 1 [2,4,5,6].
We use this theory to show a probabilistic relationship between two approaches for
complexity selection: n-fold cross-validation (popular among practioner modelers) and
Structural Risk Minimization proposed by Vapnik (rarely met in practice) [7,8,9,10]. We
remind that SRM is O ( n ) times faster than n -fold cross-validation (since SRM does not
perform any repetitions/folds per single fixed complexity, nor testing) but less accurate,
Chernoff inequality: P | ν I p |≥ ε 2exp (
2 I ) , Hoeffding inequality: P | X I EX |≥
1
ε 2exp (
2 I
A 2 ) . Meaning (respectively): observed frequencies on a sample of size I
converge to the true probability as I grows large; analogically: means of a random variable
(bounded by A and B ) converge to the expected value. It is a in-probability-convergence and
its rate is exponential.
B 2
 
Search WWH ::




Custom Search