Information Technology Reference
InDepth Information
The first term on the righthand side of the above equation is called the approx
imation error, which measures how well the best function in
G
can approach the
target. The second term, called the estimation error, measures how close
g
n
is to the
best function in
G
. In the literature of statistical learning theory, most attention has
been paid to the estimation error, which is also the focus of our introduction in this
section.
The bound regarding the estimation error takes the form
R(g
∗
)
R(g
n
)
≤
+
B(n,
G
).
Considering that
g
n
minimizes the empirical risk in
G
,wehave
R(g
∗
)
−
R(g
n
)
≥
0
.
Then we obtain the inequality
R(g
n
)
=
R(g
n
)
−
R(g
∗
)
+
R(g
∗
)
≤
R(g
∗
)
−
R(g
n
)
+
R(g
n
)
−
R(g
∗
)
+
R(g
∗
)
R(g)
R(g)
+
R(g
∗
).
≤
2sup
g
−
∈
G
Based on the above inequality, it is clear that if one can obtain a bound for
−
R(g)

, one will correspondingly obtain a bound for the estimation error.
This is what many previous works have been doing.
R(g)

R(g)

−

22.3.2 Bounds for
R(g)
−
R(g)
In order to ease the analysis on

R(g)

, here we define a new function class
called the loss class:
F
=
f
∈
G
.
:
(x, y)
→
I
,g
{
g(x)
=
y
}
n
i
=
1
f(Z
i
)
, where
Z
R(g)
1
Then it is clear that
R(g)
=
E
[
f(Z)
]
, and
=
=
(X, Y )
and
Z
i
=
(X
i
,Y
i
)
.
22.3.2.1 Hoeffding's Inequality
One may have noticed that the relationship between
R(g)
and
R(g)
is actually the
relationship between expectation and average. According to the law of large num
ber, when the number of samples approaches infinity,
R(g)
will converge to
R(g)
.
Furthermore, there is actually a quantitative version of the law of large number,
when the function
g
is bounded. This is the famous Hoeffding's Inequality.