Information Technology Reference
In-Depth Information
F
Definition 22.3 The covering number of
at radius ε , with respect to d n , denoted
as N(
F
,ε,n) is the minimum size of function set that can cover
F
at radius ε .
According to the relationship between
F
and
G
, it is not difficult to obtain that
N( F ,ε,n) = N( G ,ε,n) .
When the covering number of a function class is finite, one can approximate the
class by a finite set of functions that cover the class. In this way, we can make use
of the finite union bound. The following results have been proven.
Theorem 22.4
t> 0,
P g G : R(g) >
R(g) + t
8 E N( G ,t,n) e nt 2 / 128 .
Actually, the covering number is not difficult to compute for many function
classes. For example, if
G
is a compact set in a d -dimensional Euclidean space,
ε d . For a function class
we have N(
G
,ε,n)
G
whose VC dimension is h ,we
have N( G ,ε,n) Ch( 4 e) h ε h .
22.3.2.5 Uniform Bounds Based on Rademacher Average
To facilitate the concept of Rademacher average, we need to introduce the concept of
Rademacher variables first. Rademacher variables are independent random variables
σ 1 ,...,σ n with P(σ i =
1 / 2. We denote E σ as the expectation
taken with respect to the Rademacher variables (i.e., conditionally on the data) while
E as the expectation with respect to all the random variables. Then we have the
following definitions.
1 ) = P(σ i =−
1 ) =
Definition 22.4 For a function class
F
, the Rademacher average is defined as
E sup
f F
σ i f(Z i )
n
1
n
R
(
F
)
=
i =
1
and the conditional Rademacher average is defined as
sup
f
σ i f(Z i ) .
n
1
n
R n (
F
)
=
E σ
F
i =
1
The intuitive explanation of the Rademacher average is that it measures how
much the function class can fit random noise. According to the relationship between
F
1
G
R n (
F
=
2 R n (
G
) .
With the definition of the Rademacher average, one has the following theorem.
and
, it is not difficult to obtain that
)
Search WWH ::




Custom Search