Discrimination - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

One limit that helps understanding the properties of learning is to take

N

M/N constant. This limit is interesting

because the relative training set size α , which is the ratio of the number of

examples to the dimension of input space, is a meaningful quantity in actual

(finite size) applications. Keeping the ratio of the number of training patterns

to the network's number of parameters (instead of N ) constant also provides

interesting limits. Those limits are generally (although not necessarily) studied

with methods of statistical mechanics [Engel 2001]. In that framework, they

are known as the thermodynamic limit. Interestingly, the typical properties

evaluated in this limit are still valid for M and N large but finite.

→∞

, M

→∞

, keeping α

≡

6.7.5.1 The Typical Capacity of the Perceptron

The capacity of a network is the largest number of patterns the network can

learn to discriminate, with probability 1, irrespective of the discrimination

task, i.e., whatever the labels y k of the examples. We mentioned above that

the VC-dimension of the perceptron is N + 1: in other words, if M<N +1,

M examples in general position can always be separated by a perceptron.

However, the probability that more than M VC points are separable does not

vanish abruptly at M = N + 1. The typical capacity of a perceptron was first

determined by Cover in 1965 through an inductive geometrical reasoning. He

counted the number of dichotomies of M points that a perceptron can perform

in a space of N dimensions. Note that a perceptron performs the dichotomies

by means of oriented hyperplanes; in the following, the two orientations of

the hyperplane are considered as two different dichotomies. In general, we

have seen that a same dichotomy of L M (the same assignment of classes to

the M points) can be performed by many different (an infinite number of)

hyperplanes, but it is counted only once. In the particular case of a perceptron

without threshold (nor bias) the following result is obtained: for M<N ,the

number of linearly separable dichotomies of M in dimension N is D ( M,N )=

2 M .For M>N , the result is

M

.

D ( M,N )=2 N− 1

−

1

m

m =0

The above result is a geometrical property of points in an N -dimensional

space, irrespective of the training algorithm.

Since the total number of possible dichotomies of M points is 2 M , the prob-

ability P LS ( L M )thatasetof M points in N dimensions is linearly separable

is

P LS ( L M )= D ( M,N )

2 M

,

which may be written as the sum of the N

1 first terms in the expansion

of the binomial (1 / 2+1 / 2) M− 1 . That sum is equal to 1 / 2for N

−

1= M/ 2.

Probability P LS ( L M ) is shown on Fig. 6.21 for different values of M and N .

−

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home