Information Technology Reference
In-Depth Information
→
∞
Actually, one can easily see that as we take larger samples
I
andwesticktothe
1
standing at
ln
N
−
ln η
leave-one-out cross-validation
n
=
I
, the coefficient
n
n
−
goes
2
I
to 1, whereas the coefficient
√
n
standing at
−
ln η
2
I
goes to infinity.
One might ask: for what choice of
n
each bound is the tightest given
η
,
I
,
N
? Treating
for a moment
n
as a continuous variable, we impose the conditions:
∂ε
U
(
η
,
I
,
N
,
n
)
∂
∂ε
L
(
η
,
I
,
N
,
n
)
∂
=
0
,
=
0
,
n
n
and we get optimal
n
values:
n
U
=
1
+
√
ln
N
+
√
−
3
−
ln
η
ln
η
√
−
,
(36)
ln
η
n
L
=
1
+
2
√
ln
N
3
−
ln
η
√
−
.
(37)
ln
η
Note that these values
do not depend
on the sample size
I
.
2.2
Regression Estimation Learning Task
Now we consider the set of
real-valued
error functions but we still stay with the simplest
case when the set has a
finite
number of elements. We give theorems for the
regression
estimation
learning task, analogous to the ones for the classification. We skip proofs —
the only changes they would require is the assumption of the
bounded
functions, and
the use of Hoeffding inequality in the place of Chernoff inequality.
Theorem 4.
Let
}
ω
j
∈
Ω
,j
=
1
,
2
,...,
N, be a finite set of real-valued bounded
functions (regression estimation task) of size N,
0
{
Q
(
z
,
ω
j
)
≤
Q
(
z
,
ω
j
)
≤
B. Then, for any
η
>
0
,
arbitrarily small, there is a small number
n
k
(
n
k
=
1
1
)
k
(
2
)
k
,
α
(
η
,
n
)=
η
−
−
η
(38)
and the number
,
I
,
N
,
n
)=
2
n
n
1
+
1
B
ln
N
−
ln
η
(
ε
η
−
2
I
1
B
+
√
n
+
n
n
−
ln
η
,
(39)
−
2
I
such that:
P
,
I
,
N
,
n
)
|
V
−
C
|≤
ε
(
η
≥
1
−
α
(
η
,
n
)
.
(40)