Discrimination - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

case of the sonar signals, that probability is not negligible. Those theorems,

whose importance has already been mentioned in the first chapter, are further

discussed at the end of this chapter.

Remark 2. One may wonder why the fact that the sonar database is linearly

separable was not discovered earlier, since we have already shown in Chap. 1

that the algorithm of Ho and Kashyap [Ho 1965] provides the answer in a few

minutes. That is a consequence of the multidisciplinary character of the field

of neural networks: important results are frequently rediscovered. The authors

of this topic hope that it will contribute to overcoming such problems.

6.4.6 Adaptive (On-Line) Training Algorithms

Adaptive algorithms update the weights after the presentation of each exam-

ple, just as the perceptron algorithm does. As already pointed out in previous

chapters, adaptive training is useful when the training set is too large to be

stored in the computer memory—as requested by the optimization algorithms

described above—or in problems where the examples are available one at a

time, as is the case when a robot explores its environment.

As mentioned in Chap. 2 and 4, adaptive training can be performed by up-

dating the weights proportionally to the derivative of the partial costs defined

in the previous section. Such implementations are called methods of stochas-

tic gradient descent since the true gradient is replaced by a stochastic term

whose average is equal to the gradient. Stochasticity is due to the more or less

arbitrary order of presentation of the examples. Different orderings may end

up with different, statistically equivalent results.

Among the on-line learning algorithms for the perceptron, we mention

Minover [Krauth et al. 1987] and Adatron [Anlauf et al. 1989], which achieve

better performances than the perceptron algorithm. In fact, Adatron is an

adaptive version of the relaxation algorithm described above.

6.4.7 An Interpretation of Training in Terms of Forces

In this section, we provide an interpretation of training in terms of forces

produced by the examples on the hyperplane, which provides insight into

the non-convergence of some algorithms when the training set is not linearly

separable.

Given the orientation of the hyperplane at iteration t , the contribution of

an example k to the weight update may be interpreted as a force

µ

M

∂V ( z k )

∂ w

F k ( t )=

−

( t )

µ

M

∂V ( z k )

∂z k

( t ) y k x k

=

−

= c k ( t ) y k x k

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home