Probabilistic Connection between Cross-Validation and Vapnik Bounds - Agents and Artificial Intelligence

Information Technology Reference

In-Depth Information

Probabilistic Connection between Cross-Validation

and Vapnik Bounds

Przemysław Klesk

Department of Methods of Artificial Intelligence and Applied Mathematics

Westpomeranian University of Technology

ul. Zolnierska 49, Szczecin, Poland

pklesk@wi.zut.edu.pl

Abstract. In the paper we analyze a connection between outcomes of the cross-

validation procedure and Vapnik bounds [1,2] on generalization of learning

machines. We do not focus on how well the measured cross-validation outcome

estimates the generalization error or how far it is from the training error; instead,

we want to make statements about the cross-validation result without actually

measuring it. In particular we want to state probabilistically what

-difference one

can expect between the known Vapnik bound and the unknown cross-validation

result for given conditions of the experiment. In the consequence, we are able to

calculate the necessary size of the training sample, so that the ε is sufficiently

small; and so that the optimal complexity indicated via SRM is acceptable in

the sense that cross-validation, if performed, would probably indicate the same

complexity. We consider a non-stratified variant of cross-validation, which is con-

venient for the main theorem.

ε

Keywords. Statistical learning theory, Bounds on generalization, Cross valida-

tion, Empirical risk minimization, Structural risk minimization, Vapnik-Chervon-

enkis dimension.

1

Introduction and Notation

One part of the Statistical Learning Theory developed by Vapnik [1,2,3] is the theory

of bounds . It provides probabilistic bounds on generalization of learning machines. The

key mathematical tools applied to derive the bounds in their additive versions are Cher-

noff and Hoeffding inequalities 1 [2,4,5,6].

We use this theory to show a probabilistic relationship between two approaches for

complexity selection: n-fold cross-validation (popular among practioner modelers) and

Structural Risk Minimization proposed by Vapnik (rarely met in practice) [7,8,9,10]. We

remind that SRM is O ( n ) times faster than n -fold cross-validation (since SRM does not

perform any repetitions/folds per single fixed complexity, nor testing) but less accurate,

Chernoff inequality: P | ν I − p |≥ ε ≤ 2exp ( − 2ε

2 I ) , Hoeffding inequality: P | X I − EX |≥

1

ε ≤ 2exp ( −

2 I

2ε

A 2 ) . Meaning (respectively): observed frequencies on a sample of size I

converge to the true probability as I grows large; analogically: means of a random variable

(bounded by A and B ) converge to the expected value. It is a in-probability-convergence and

its rate is exponential.

B 2

−

Agents and Artificial Intelligence

Search WWH ::

Custom Search

Home