Information Technology Reference
In-Depth Information
One limit that helps understanding the properties of learning is to take
N
M/N
constant. This limit is interesting
because the
relative training set size α
, which is the ratio of the number of
examples to the dimension of input space, is a meaningful quantity in actual
(finite size) applications. Keeping the ratio of the number of training patterns
to the network's
number of parameters
(instead of
N
) constant also provides
interesting limits. Those limits are generally (although not necessarily) studied
with methods of statistical mechanics [Engel 2001]. In that framework, they
are known as the thermodynamic limit. Interestingly, the typical properties
evaluated in this limit are still valid for
M
and
N
large but finite.
→∞
,
M
→∞
, keeping
α
≡
6.7.5.1 The Typical Capacity of the Perceptron
The
capacity
of a network is the largest number of patterns the network can
learn to discriminate, with probability 1, irrespective of the discrimination
task, i.e., whatever the labels
y
k
of the examples. We mentioned above that
the VC-dimension of the perceptron is
N
+ 1: in other words, if
M<N
+1,
M
examples in general position can always be separated by a perceptron.
However, the probability that
more
than
M
VC
points are separable
does not
vanish
abruptly at
M
=
N
+ 1. The typical capacity of a perceptron was first
determined by Cover in 1965 through an inductive geometrical reasoning. He
counted the number of dichotomies of
M
points that a perceptron can perform
in a space of
N
dimensions. Note that a perceptron performs the dichotomies
by means of oriented hyperplanes; in the following, the two orientations of
the hyperplane are considered as two different dichotomies. In general, we
have seen that a same dichotomy of
L
M
(the same assignment of classes to
the
M
points) can be performed by many different (an infinite number of)
hyperplanes, but it is counted only once. In the particular case of a perceptron
without threshold (nor bias) the following result is obtained: for
M<N
,the
number of linearly separable dichotomies of
M
in dimension
N
is
D
(
M,N
)=
2
M
.For
M>N
, the result is
M
.
D
(
M,N
)=2
N−
1
−
1
m
m
=0
The above result is a geometrical property of points in an
N
-dimensional
space, irrespective of the training algorithm.
Since the total number of possible dichotomies of
M
points is 2
M
, the prob-
ability
P
LS
(
L
M
)thatasetof
M
points in
N
dimensions is linearly separable
is
P
LS
(
L
M
)=
D
(
M,N
)
2
M
,
which may be written as the sum of the
N
1 first terms in the expansion
of the binomial (1
/
2+1
/
2)
M−
1
. That sum is equal to 1
/
2for
N
−
1=
M/
2.
Probability
P
LS
(
L
M
) is shown on Fig. 6.21 for different values of
M
and
N
.
−
Search WWH ::
Custom Search