Information Technology Reference
In-Depth Information
asserts that one can estimate on the basis of the values of empirical risk the
minimal possible value of the risk.
8.2.2 Key theorem of learning theory
In 1989 Vapnik and Chefvonenkis proposed the key theorem of learning theory
as follows (Vapnik and Chefvonenkis, 1991).
L
(
y
,
w
),
w
Λ
Theorem 8.1 Let
be a set of functions that satisfy the condition
Ð
A
L
(
y
,
w
)
dF
(
y
)
B
(
A
R
(
w
)
B
).
(8.6)
Then for the ERM principle to be consistent in the following sense:
lim
P
{
sup
R
(
w
)
R
(
w
))
>
ε
}
=
0
∀ε
>
0
.
(8.7)
emp
l
w
It is necessary and sufficient that the empirical risk
R
emp (
w
) converge uniformly
L y w
( ,
),
w
Λ
to the actual risk
R
(
w
) over the set
. We call this type of uniform
convergence uniform one-sided convergence.
8.2.3 VC entropy
≤ ≤ ∈ , be a set of bounded loss functions.
Using this set of functions and the training set 1 ,
L y w
( ,
)
B
,
ω
Λ
Definition 8.2 Let A
z
> , one can construct the
l
following set of l-dimensional vectors:
(
w
)
=
(
L z
(
,
w
),
?
,
L z
(
,
w
)),
w
Λ
q
(8.8)
1
l
This set of vectors belong to the l-dimensional cube and has a finite minimal ε-
net in the metric
C
(or in the metric
L p ).
Λ
N
=
N
(
ε
;
z
,
?
,
z
)
Let
be the number of elements of the minimal ε-net of
1
l
Λ
N
(
ε
;
z
,
?
,
z
)
this set of vectors q (
w
),
w ∈Λ. Note that
is a random variable,
1
l
since it is constructed using random Vectors
z 1 ,…,z l .The logarithm of the random
Λ
Λ
Λ
N
(
ε
;
z
,
?
,
z
)
H
(
ε
;
z
,
?
,
z
)
=
ln
N
(
ε
;
z
,
?
,
z
)
value
,
is called the
1
l
1
l
1
l
A
L
(
y
,
w
)
B
,
w
Λ
random VC entropy of the set offunctions
on the
sample
z 1 ,…,z l . The expectation of the random VC entropy
H
Λ
(
ε
;
l
)
=
EH
Λ
(
ε
;
z
,
?
,
z
)
1
l
Search WWH ::




Custom Search