Information Technology Reference
The first term on the right-hand side of the above equation is called the approx-
imation error, which measures how well the best function in
can approach the
target. The second term, called the estimation error, measures how close g n is to the
best function in
. In the literature of statistical learning theory, most attention has
been paid to the estimation error, which is also the focus of our introduction in this
The bound regarding the estimation error takes the form
R(g ∗ )
R(g n )
Considering that g n minimizes the empirical risk in
R(g ∗ )
− R(g n )
Then we obtain the inequality
R(g n ) = R(g n ) − R(g ∗ ) + R(g ∗ )
≤ R(g ∗ ) − R(g n ) + R(g n ) − R(g ∗ ) + R(g ∗ )
R(g ∗ ).
Based on the above inequality, it is clear that if one can obtain a bound for
, one will correspondingly obtain a bound for the estimation error.
This is what many previous works have been doing.
22.3.2 Bounds for
In order to ease the analysis on
, here we define a new function class
called the loss class:
F = f
∈ G .
n i = 1 f(Z i ) , where Z
Then it is clear that R(g)
(X, Y ) and Z i =
(X i ,Y i ) .
184.108.40.206 Hoeffding's Inequality
One may have noticed that the relationship between R(g) and R(g) is actually the
relationship between expectation and average. According to the law of large num-
ber, when the number of samples approaches infinity, R(g) will converge to R(g) .
Furthermore, there is actually a quantitative version of the law of large number,
when the function g is bounded. This is the famous Hoeffding's Inequality.