Discrimination - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

The training set is a su cient constraint for the student network only

if M>M VC . Otherwise, either the network is too complex to capture the

regularities of the task to be learnt, or, equivalently, the number of examples

is not large enough. Therefore, it is important to know the networks' VC

dimension M VC . In the case of the perceptron with N inputs and one threshold

(or bias), with M examples to learn, the N + 1 components of the weight

vector w must verify the M inequalities z k ( w ) > 0( k =1 ,...,M ). Now, the

maximal number of compatible independent inequalities (that is, that have

a non trivial solution) is N + 1. If we had more than N + 1 inequalities,

the system might become incompatible. We cannot guarantee that a solution

exists for arbitrary training sets if M>N + 1. In fact, the solution exists

only if the training set is linearly separable. Thus, the VC dimension of the

perceptron is

M VC = N +1 .

Since M M VC is necessary to have good generalization, a lot of theoreti-

cal effort has been devoted to determining the Vapnik-Chervonenkis dimension

of neural networks. However, for networks more complex than the perceptron,

only some approximations of M VC for particular architectures are available.

For example, it has been established [Baum 1989] that the VC dimension of

networks with one hidden layer of H neurons, having N w =( N +1) H +( H +1)

weights (including the bias), satisfies

2 N H

2

≤

M VC ≤

2 N W log 2 ( eH ) ,

where

stands for the integer part, and e is the base of the neperian log-

arithm. The left hand side of the above relation states that, if we have M

examples, we should use a number of hidden neurons H

M/N .Thisre-

sult is somewhat disappointing, as it simply tells us that the network number

of parameters (which is of order NH ) should be smaller than the number of

patterns.

6.7.5 Prediction of the Typical Behavior

A different theoretical approach allows characterizing the learning problems

by their typical properties, i.e., properties that are verified with probability

1. That means that the probability that a system does not obey the pre-

dicted behavior vanishes. This is similar in spirit to the law of large numbers,

which states that the average of N independent identically distributed ran-

dom variables tends to the expectation of the random variable when N

,

with probability 1. Along the same lines, Vapnik's theory states the condi-

tions under which the typical gap between ε t and ε g is arbitrarily small in the

asymptotic limit M

→∞

. Typical properties are thus asymptotic properties.

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home