Information Technology Reference
In-Depth Information
Actually, one can easily see that as we take larger samples I
andwesticktothe
1 standing at ln N ln η
leave-one-out cross-validation n = I , the coefficient n
n
goes
2 I
to 1, whereas the coefficient n standing at ln η
2 I goes to infinity.
One might ask: for what choice of n each bound is the tightest given
η
, I , N ? Treating
for a moment n as a continuous variable, we impose the conditions:
∂ε U ( η , I , N , n )
∂ε L ( η , I , N , n )
= 0 ,
= 0 ,
n
n
and we get optimal n values:
n U = 1 + ln N
+
3
ln
η
ln
η
,
(36)
ln
η
n L = 1 + 2 ln N
3
ln
η
.
(37)
ln
η
Note that these values do not depend on the sample size I .
2.2
Regression Estimation Learning Task
Now we consider the set of real-valued error functions but we still stay with the simplest
case when the set has a finite number of elements. We give theorems for the regression
estimation learning task, analogous to the ones for the classification. We skip proofs —
the only changes they would require is the assumption of the bounded functions, and
the use of Hoeffding inequality in the place of Chernoff inequality.
Theorem 4. Let
} ω j Ω ,j = 1 , 2 ,..., N, be a finite set of real-valued bounded
functions (regression estimation task) of size N, 0
{
Q ( z ,
ω j )
Q ( z ,
ω j )
B. Then, for any
η
> 0 ,
arbitrarily small, there is a small number
n
k
(
n
k = 1
1 ) k ( 2
) k ,
α
(
η
, n )=
η
η
(38)
and the number
, I , N , n )= 2 n
n
1 + 1 B ln N
ln
η
(
ε
η
2 I
1 B
+ n + n
n
ln
η
, (39)
2 I
such that:
P
, I , N , n )
|
V
C
|≤ ε
(
η
1
α
(
η
, n ) .
(40)
Search WWH ::




Custom Search