Information Technology Reference
In-Depth Information
h
=
31
h
=
151
h
=
231
1.0
1.0
1.0
0.5
0.5
0.5
y
y
y
1.0
1.0
1.0
0.0
0.0
0.0
0.5
0.5
0.5
1.0
1.0
1.0
x
2
x
2
x
2
0.5
0.5
0.5
0.0
0.0
0.0
x
1
x
1
x
1
0.5
0.5
0.5
0.0
0.0
0.0
1.0
1.0
1.0
h
=
31
h
=
151
h
=
231
1.0
1.0
1.0
0.8
0.8
0.8
0.6
0.6
0.6
0.4
0.4
0.4
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
x
1
x
1
x
1
Fig. 6.
Exemplary models for both regresssion estimation and classfication: under complex (
h
=
31), accurately complex — the best generalization (
h
=
151), over complex (
h
=
231).
Suitable theorems about this relationship are stated and proved. The theorems con-
cern two learning tasks: classification and regression estimation; and also two cases as
regards the capacity of the set of approximating functions: finite sets and infinite sets
(but with finite Vapnik-Chervonenkis dimension).
As the sample size grows large, both
C
and
V converge in probability
to the same
limit of true risk. The rate of convergence is exponential.
Using the theorems, one can find a threshold size of sample so that the difference
C
for given exper-
iment conditions, the more frequently one can expect to select the same optimal model
complexity via SRM and via cross-validation (again without actually performing it).
For the special case of leave-one-out cross-validation we observe in the consequence
−
V
or
V
−
C
is smaller than an imposed
ε
. Obviously, the smaller
ε
of bounds we derived that at most a constant difference of order
O
(
−
ln
η
/
2
)
between
C
and
V
can be expected.
Additionally, we showed for what number
n
of folds, the bounds (lower and upper)
on the difference are the tightest. Interestingly, as it turns out these optimal
n
values
do
not
depend on the sample size.
Finally, shown are experiments confirming statistical correctness of the bounds.
Acknowledgements.
This work has been financed by the Polish Government, Ministry
of Science and Higher Education from the sources for science within years 2010-2012.
Research project no.: N N516 424938.