Information Technology Reference
In-Depth Information
The training set is a su cient constraint for the student network only
if M>M VC . Otherwise, either the network is too complex to capture the
regularities of the task to be learnt, or, equivalently, the number of examples
is not large enough. Therefore, it is important to know the networks' VC
dimension M VC . In the case of the perceptron with N inputs and one threshold
(or bias), with M examples to learn, the N + 1 components of the weight
vector w must verify the M inequalities z k ( w ) > 0( k =1 ,...,M ). Now, the
maximal number of compatible independent inequalities (that is, that have
a non trivial solution) is N + 1. If we had more than N + 1 inequalities,
the system might become incompatible. We cannot guarantee that a solution
exists for arbitrary training sets if M>N + 1. In fact, the solution exists
only if the training set is linearly separable. Thus, the VC dimension of the
perceptron is
M VC = N +1 .
Since M M VC is necessary to have good generalization, a lot of theoreti-
cal effort has been devoted to determining the Vapnik-Chervonenkis dimension
of neural networks. However, for networks more complex than the perceptron,
only some approximations of M VC for particular architectures are available.
For example, it has been established [Baum 1989] that the VC dimension of
networks with one hidden layer of H neurons, having N w =( N +1) H +( H +1)
weights (including the bias), satisfies
2 N H
2
M VC
2 N W log 2 ( eH ) ,
where
stands for the integer part, and e is the base of the neperian log-
arithm. The left hand side of the above relation states that, if we have M
examples, we should use a number of hidden neurons H
M/N .Thisre-
sult is somewhat disappointing, as it simply tells us that the network number
of parameters (which is of order NH ) should be smaller than the number of
patterns.
6.7.5 Prediction of the Typical Behavior
A different theoretical approach allows characterizing the learning problems
by their typical properties, i.e., properties that are verified with probability
1. That means that the probability that a system does not obey the pre-
dicted behavior vanishes. This is similar in spirit to the law of large numbers,
which states that the average of N independent identically distributed ran-
dom variables tends to the expectation of the random variable when N
,
with probability 1. Along the same lines, Vapnik's theory states the condi-
tions under which the typical gap between ε t and ε g is arbitrarily small in the
asymptotic limit M
→∞
→∞
. Typical properties are thus asymptotic properties.
Search WWH ::




Custom Search