Information Technology Reference
In-Depth Information
n
k = 1 R emp ( ω I , k ) .
C = 1
n
(12)
The independence of folds can be formally expressed in the following way. For any two
indices of folds k
= l and for any numbers A , B :
P ( R emp (
ω I , k )= A , R emp (
ω I , l )= B )
= P ( R emp (
P ( R emp (
ω I , k )= A )
·
ω I , l )= B ) .
We stress the independence once again, because later on we are going to sum up several
independent probabilistic inequalities into one inequality, and we would like the result
to be true with the effective probability being the product of component probabilities.
2
The Relationship for a Finite Set of Approximating Functions
2.1
Classification Learning Task
Similarly to Vapnik, let us start with the classification learning task and the simplest
case of a finite set of N indicator functions:
{
Q ( z ,
ω j )
} ω j Ω , j = 1 , 2 ,..., N .Notto
complicate things, we will keep on writing
I in the sense of the optimal function
minimizing the empirical risk on our finite sample of size I , instead of writing more
formally e.g.
ω
ω j ( I ) 7 .
Vapnik shows [1,2] that with probability at least 1
η
, the following bound on the
true risk is satisfied:
+ ln N
I
i = 1 Q ( z i , ω I )
R emp ( ω I )
1
I
ln
η
Q ( z ,
I ) dF ( z )
.
ω
(13)
2 I
Z
R (
ω I )
The argument is the following:
P sup
1
R (
ω j )
R emp (
ω j )
ε
j
N
j = 1 P R ( ω j ) R emp ( ω j ) ε N · exp (
N
2 I ) .
The last pass is true, since for each term in the sum Chernoff inequality is satisfied. By
substituting the right-hand-side with small probability
η
and solving for
ε
, one obtains
the bound:
ln N
ln
η
R (
j )
R emp (
j )
,
ω
ω
2 I
7
In the sense that j ( I ) ∈{ 1 ,..., N } returns the index of the minimizer given our data set of size
I .
Search WWH ::




Custom Search