Information Technology Reference
In-Depth Information
∈[
]
Theorem 22.1
Let Z
1
,...,Z
n
be n i
.
i
.
d
.
random variables with f(Z)
a,b
.
Then
∀
ε>
0,
we have
2exp
.
n
f(Z
i
)
−
E
f(Z)
2
nε
2
(b
1
n
P
>ε
≤
−
−
a)
2
i
=
1
If we denote the right-hand side of the Hoeffding's Inequality as
δ
, we can have
the following equivalent form of it: with probability at least 1
−
δ
,
log
δ
2
n
≤
n
E
f(Z)
1
n
f(Z
i
)
−
(b
−
a)
.
i
=
1
. Further considering the relationship between
f
and
g
,
we have with probability at least 1
In our case,
f(Z)
∈[
0
,
1
]
−
δ
,
log
δ
2
n
≤
R(g)
R(g)
+
.
The above bound looks quite nice, however, it should be noted that the bound
also has its limitations. The major issue lies in that Hoeffding's Inequality holds for
afixedfunction
f
, and the probability is with regards to the sampling of the data.
That is, given a function
f
, it is of large probability that there is a set of samples
satisfying the inequality. However, these sets can be different for different functions.
As a result, for a fixed sample, only some of the functions in
F
will satisfy the
inequality.
To tackle the aforementioned problem, one choice is to consider the supremum
of all the functions and derive a looser bound. We will make more introductions on
this in the next subsection.
22.3.2.2 Uniform Bounds in Finite Case
In order to obtain a uniform bound which is not data dependent, we need to consider
the failure case for each function in the class. For ease of discussion, here we assume
that there are a finite number of functions in the class, and denote the number as
N
.
Then, for
f
k
(k
1
,...,N)
,weuse
C
i
to denote the set of “bad” samples, i.e.,
those for which the bound given by Hoeffding's inequality fails. That is,
=
z
>ε
.
n
E
f
k
(Z)
1
n
C
k
=
:
f
k
(Z
i
)
−
i
=
1
δ
. Then according to the properties of probabilities of
union sets, we can obtain that
P(C
1
∪···∪
C
N
)
≤
Nδ
. As a result, we have
It is clear that
P(C
k
)
≤