Probabilistic Connection between Cross-Validation and Vapnik Bounds - Agents and Artificial Intelligence

Information Technology Reference

In-Depth Information

1.2

Notation Related to Cross-Validation

In the paper, we shall consider the non-stratified variant of the n -fold cross-validation

procedure [20]. In each single fold (iteration) we first permute the data set and then we

split it at the same fixed point into two disjoint subsets — a training set and a testing set.

Thus, we guarantee the randomness by permutation per each fold, and among folds we

do not care to make training sets disjoint pairwise. Since permutations are independent,

hence folds are independent as well.

Such an approach is somewhere in-between the classical n -fold cross-validation and

the bootstrapping [21]. In the classical cross-validation, all 2 pairs of training sets

are mutually disjoint (and so are testing sets) and hence folds are dependent, whereas

in the bootstrapping instead of repeatedly analyzing subsets of data set, one repeat-

edly analyzes the subsamples (with replacement) of the data. For more information see

[22,23,24].

We introduce the following notation. I and I stand for the size of training and

testing sets respectively.

I = n

−

I ,

I = 1

n I .

Without loss of generality for theorems and proofs, let I be dividable by n ,sothat I

and I are integers.

In a single fold, let

z 1 , z 2 ,..., z I }

{

represent respectively the training set and the testing set, taken as a split of the whole

permuted data set

{

z 1 , z 2 ,..., z I }

. Similarly, empirical risks calculated as follows:

i = 1 Q ( z i , ω ) ,

)= 1

R emp (

(9)

i = 1 Q ( z i , ω ) ,

)= 1

R emp (

(10)

represent respectively the training error and the testing error, calculated for any

function

ω I we define the function that minimizes the empirical training risk

R emp (

ω I )= inf

ω ∈ Ω

)

(11)

when the context of discussion is constrained to single fold. When, we will need to

broaden the context onto all folds, k = 1 , 2 ,..., n , we will write

ω I , k to denote the

function that minimizes the empirical training risk in the k -th fold. Therefore, the fi-

nal cross-validation result — an estimate of generalization error — is the mean from

empirical testing risks R emp using functions

ω I , k :

Agents and Artificial Intelligence

Search WWH ::

Custom Search

Home