Information Technology Reference
In-Depth Information
since the selection of optimal complexity is based on the guaranteed generalization
risk. The bound for the guaranteed risk is expressed in terms of
Vapnik-Chervonenkis
dimension
, and is a pessimistic overestimation of the
growth function
, which in turn
is overestimation of the unknown
Vapnik-Chervonenkis entropy
. We formally remind
these notions later in the paper. All those overestimations contribute (unfortunately) to
the fact that for a fixed sample size, SRM usually underestimates the optimal complexity
and chooses too simple model.
Results presented in this paper may be regarded as conceptually akin to results
by Holden [11,12], where error bounds on cross-validation and so-called
sanity-check
bounds are derived. The sanity-check bound is a proof, for large class of learning algo-
rithms, that the error of the
leave-one-out
estimate is not much worse —
O
(
h
/
I
)
—
than the worst-case behavior of the training error estimate, where
h
stands for Vapnik-
Chervonenkis dimension of given set of functions and
I
stands for the sample size. The
name sanity-check refers to the fact that although we believe that under many circum-
stances, the leave-one-out estimate will perform better than the training error (and thus
justify its computational expense) the goal of the sanity-check bound is to simply prove
that it is not much worse than the training error [13].
These results were further generalized by Kearns [13,14,15] using the notion of
β
2
)
-error stability
2
rather than
(
β
2
)
-hypothesis stability
3
imposed on the learn-
(
β
1
,
β
1
,
ing algorithm.
For the sake of comparison and to set up the perspective for further reading of this pa-
per, we highlight some differences of meaning of our results and the results mentioned
above:
- we do not focus on how well the
measured
cross-validation result estimates the
generalization error or how far it is from the training error in the leave-one-out case
— sanity-check bounds [12,13]; instead, we want to make statements about the
cross-validation result
without actually measuring it
, thus, remaining in the setting
of the SRM framework.
- in particular we want to state probabilistically what
-difference one can expect
between the
known
Vapnik bound and the
unknown
cross-validation result for given
conditions of the experiment,
- in the consequence, we want to be able to calculate the necessary size of the train-
ing sample, so that the
ε
is sufficiently small, and so that the optimal complexity
indicated via SRM is acceptable in the sense that cross-validation, if performed,
would probably indicate the same complexity; this statement may seem related to
the notion of
sample complexity
considered e.g. by Bartlett [16,17] or Ng [18], but
we do not find the sample size required for the algorithm to learn/generalize “well”
ε
2
We say that a learning algorithm has a
(
β
1
,
β
2
)
-error stability, if generalization errors for two
models provided by this algorithm using respectively a training sample of size
I
and a sample
with size lowered to
I
−
−
β
2
. Obviously
1are
β
1
-close to each other with probability at least 1
the smaller both β
1
,
β
2
are the more stable the algorithm.
3
We say that a learning algorithm has a
(
β
1
,
β
2
)
-hypothesis stability, if the two models provided
by this algorithm using respectively a training sample of size
I
and sample with size lowered
to
I
−
1areβ
1
-close to each other with probability at least 1
−
β
2
, where closeness of models
is measured by some functional metrics, e.g.
L
1
,
L
2
,etc.