Information Technology Reference
In-Depth Information
F
Definition 22.3
The covering number of
at radius
ε
, with respect to
d
n
, denoted
as
N(
F
,ε,n)
is the minimum size of function set that can cover
F
at radius
ε
.
According to the relationship between
F
and
G
, it is not difficult to obtain that
N(
F
,ε,n)
=
N(
G
,ε,n)
.
When the covering number of a function class is finite, one can approximate the
class by a finite set of functions that cover the class. In this way, we can make use
of the finite union bound. The following results have been proven.
Theorem 22.4
∀
t>
0,
P
∃
g
∈
G
:
R(g) >
R(g)
+
t
≤
8
E
N(
G
,t,n)
e
−
nt
2
/
128
.
Actually, the covering number is not difficult to compute for many function
classes. For example, if
G
is a compact set in a
d
-dimensional Euclidean space,
ε
−
d
. For a function class
we have
N(
G
,ε,n)
≈
G
whose VC dimension is
h
,we
have
N(
G
,ε,n)
≤
Ch(
4
e)
h
ε
−
h
.
22.3.2.5 Uniform Bounds Based on Rademacher Average
To facilitate the concept of Rademacher average, we need to introduce the concept of
Rademacher variables first. Rademacher variables are independent random variables
σ
1
,...,σ
n
with
P(σ
i
=
1
/
2. We denote
E
σ
as the expectation
taken with respect to the Rademacher variables (i.e., conditionally on the data) while
E
as the expectation with respect to all the random variables. Then we have the
following definitions.
1
)
=
P(σ
i
=−
1
)
=
Definition 22.4
For a function class
F
, the Rademacher average is defined as
E
sup
f
∈
F
σ
i
f(Z
i
)
n
1
n
R
(
F
)
=
i
=
1
and the conditional Rademacher average is defined as
sup
f
σ
i
f(Z
i
)
.
n
1
n
R
n
(
F
)
=
E
σ
∈
F
i
=
1
The intuitive explanation of the Rademacher average is that it measures how
much the function class can fit random noise. According to the relationship between
F
1
G
R
n
(
F
=
2
R
n
(
G
)
.
With the definition of the Rademacher average, one has the following theorem.
and
, it is not difficult to obtain that
)