Information Technology Reference
In-Depth Information
∀
−
Theorem 22.5
0
<δ<
1,
with probability at least
1
δ
,
log
δ
2
n
≤
R(g)
R(g)
+
2
R
(
G
)
+
and also with probability at least
1
−
δ
,
2log
δ
n
≤
R(g)
R(g)
+
2
R
n
(
G
)
+
.
Similar to the covering number, it is not difficult to compute the upper bounds for
the Rademacher avera
ge in ma
ny cases. For example, for a finite set with
|
G
|=
N
,
2
√
l
og
N
/n
. And for a function class whose VC dimension is
h
,
R
n
(
G
≤
we have
)
C
√
h/n
.
we have
R
n
(
G
)
≤
References
1. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Berlin (2006)
2. Bousquet, O., Boucheron, S., Lugosi, G.: Introduction to statistical learning theory. In: Ad-
vanced Lectures on Machine Learning, pp. 169-207. Springer, Berlin (2004)
3. Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)
4. Ng, A.: Lecture notes for machine learning. Stanford University cs229 (2010).
http://see.
stanford.edu/see/courseinfo.aspx?coll=348ca38a-3a6d-4052-937d-cb017338d7b1s
5. Vapnik, V.N.: Statistical Learning Theory. Wiley-Interscience, New York (1998)