Information Technology Reference
In-Depth Information
One limit that helps understanding the properties of learning is to take
N
M/N constant. This limit is interesting
because the relative training set size α , which is the ratio of the number of
examples to the dimension of input space, is a meaningful quantity in actual
(finite size) applications. Keeping the ratio of the number of training patterns
to the network's number of parameters (instead of N ) constant also provides
interesting limits. Those limits are generally (although not necessarily) studied
with methods of statistical mechanics [Engel 2001]. In that framework, they
are known as the thermodynamic limit. Interestingly, the typical properties
evaluated in this limit are still valid for M and N large but finite.
→∞
, M
→∞
, keeping α
6.7.5.1 The Typical Capacity of the Perceptron
The capacity of a network is the largest number of patterns the network can
learn to discriminate, with probability 1, irrespective of the discrimination
task, i.e., whatever the labels y k of the examples. We mentioned above that
the VC-dimension of the perceptron is N + 1: in other words, if M<N +1,
M examples in general position can always be separated by a perceptron.
However, the probability that more than M VC points are separable does not
vanish abruptly at M = N + 1. The typical capacity of a perceptron was first
determined by Cover in 1965 through an inductive geometrical reasoning. He
counted the number of dichotomies of M points that a perceptron can perform
in a space of N dimensions. Note that a perceptron performs the dichotomies
by means of oriented hyperplanes; in the following, the two orientations of
the hyperplane are considered as two different dichotomies. In general, we
have seen that a same dichotomy of L M (the same assignment of classes to
the M points) can be performed by many different (an infinite number of)
hyperplanes, but it is counted only once. In the particular case of a perceptron
without threshold (nor bias) the following result is obtained: for M<N ,the
number of linearly separable dichotomies of M in dimension N is D ( M,N )=
2 M .For M>N , the result is
M
.
D ( M,N )=2 N− 1
1
m
m =0
The above result is a geometrical property of points in an N -dimensional
space, irrespective of the training algorithm.
Since the total number of possible dichotomies of M points is 2 M , the prob-
ability P LS ( L M )thatasetof M points in N dimensions is linearly separable
is
P LS ( L M )= D ( M,N )
2 M
,
which may be written as the sum of the N
1 first terms in the expansion
of the binomial (1 / 2+1 / 2) M− 1 . That sum is equal to 1 / 2for N
1= M/ 2.
Probability P LS ( L M ) is shown on Fig. 6.21 for different values of M and N .
Search WWH ::




Custom Search