Information Technology Reference
In-Depth Information
case of the sonar signals, that probability is not negligible. Those theorems,
whose importance has already been mentioned in the first chapter, are further
discussed at the end of this chapter.
Remark 2. One may wonder why the fact that the sonar database is linearly
separable was not discovered earlier, since we have already shown in Chap. 1
that the algorithm of Ho and Kashyap [Ho 1965] provides the answer in a few
minutes. That is a consequence of the multidisciplinary character of the field
of neural networks: important results are frequently rediscovered. The authors
of this topic hope that it will contribute to overcoming such problems.
6.4.6 Adaptive (On-Line) Training Algorithms
Adaptive algorithms update the weights after the presentation of each exam-
ple, just as the perceptron algorithm does. As already pointed out in previous
chapters, adaptive training is useful when the training set is too large to be
stored in the computer memory—as requested by the optimization algorithms
described above—or in problems where the examples are available one at a
time, as is the case when a robot explores its environment.
As mentioned in Chap. 2 and 4, adaptive training can be performed by up-
dating the weights proportionally to the derivative of the partial costs defined
in the previous section. Such implementations are called methods of stochas-
tic gradient descent since the true gradient is replaced by a stochastic term
whose average is equal to the gradient. Stochasticity is due to the more or less
arbitrary order of presentation of the examples. Different orderings may end
up with different, statistically equivalent results.
Among the on-line learning algorithms for the perceptron, we mention
Minover [Krauth et al. 1987] and Adatron [Anlauf et al. 1989], which achieve
better performances than the perceptron algorithm. In fact, Adatron is an
adaptive version of the relaxation algorithm described above.
6.4.7 An Interpretation of Training in Terms of Forces
In this section, we provide an interpretation of training in terms of forces
produced by the examples on the hyperplane, which provides insight into
the non-convergence of some algorithms when the training set is not linearly
separable.
Given the orientation of the hyperplane at iteration t , the contribution of
an example k to the weight update may be interpreted as a force
µ
M
∂V ( z k )
w
F k ( t )=
( t )
µ
M
∂V ( z k )
∂z k
( t ) y k x k
=
= c k ( t ) y k x k
 
Search WWH ::




Custom Search