Information Technology Reference
In-Depth Information
By definition, the VC dimension is the capacity of
the learning algorithm to shatter points in the input
space (Vapnik, 1998). Formally it is the cardinality
of the largest set of points that an algorithm can
shatter. The importance of the VC dimension is
that it appears explicitly in the bound on the total
error of an algorithm. The total error of the learning
machine is the sum of the training error (empirical
error) and the testing error (generalization error):
where D is the minimum radius of the sphere that
contains all the training points and ||w|| is the norm
of the weight vector that SVM is minimizing. ||w||
is given by (Vapnik, 1998):
n
2
w
=
a i
i
=
1
This bound is important in two ways: it is easy
to compute, and Burges has shown that the true
VC is closely related to this bound. In particular he
showed that, most of the time, the true minimum
of the VC dimension is obtained when this bound
is minimal (Burges, 1998).
Notation. The following notations will be used
in the definitions and propositions that follow:
ε = ε
+
ε
emp
where ε emp is the training error, and ε g is the gen-
eralization error (Vapnik, 1998). e emp can be made
arbitrarily small by choosing a machine with a VC
dimension at least equal to the number of training
points. In order to decrease the training error, one
has to increase the VC dimension. If a machine can
split all the training points without any errors, it
will have ε emp = 0. Vapnik has established a bound
on the testing error given by:
X = {x 1 , x 2 ,...x n } is the set of training vectors where
x i
R m and Y = {y 1 , y 2 ,...y n } is their corresponding
class where y i
{-1, 1}.
S(X, Y) is any learning algorithm trained by
training vectors X with class Y, and where all the
parameters of S are set.
l
VC
2
ln n
VC ln
[
(
)
+ −
1
( )]
4
ε g
<
l
D is the radius of the smallest sphere that englobes
all the training points, and w is the weight vector
of the hyperplane.
where l is the number of training data, VC is the
VC dimension (referred to as h in some refer-
ences), and 1 - η is the probability for which this
last equation holds. This inequality shows that
the error is bounded by an increasing function of
the VC dimension, and thus a trade-off should be
made between the empirical error and the gener-
alization error. Although it is extremely difficult
and sometimes impossible to compute the VC
dimension of a certain algorithm, a bound on the
VC dimension has been established and will be
very useful in building the confidence measure.
Vapnik states that a bound on the VC dimension
is given by:
z is an unlabeled vector and d(z)
{-1, 1} is its
prediction.
X z is the set of training points X to which z is ap-
pended, X z = {X, z}.
Y -1 is the set of labels Y to which (-1) is appended:
Y -1 = {Y, -1}, and Y 1 is the set of labels Y to which
(1) is appended: Y 1 = {Y, 1}.
Based on the notation presented, X z represents
a new training set that contains X to which z is
added. Y -1 is the set of labels formed by Y to which
(-1) is added. This means that we are assuming
2 2
VC
w D
<
Search WWH ::




Custom Search