Modeling with Neural Networks: Principles and Model Design Methodology - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

2.5.3 Adaptive (On-Line) Training of Models that Are Nonlinear

with Respect to Their Parameters

In the previous sections, we discussed methods that optimize the least squares

cost function by using all the training data available at the beginning of

training: the gradient of the total cost can be computed as the sum of the

gradients of the partial costs.

In adaptive (on-line) training, parameters are updated by using the gradi-

ent of the partial cost for each example, so that training can start even before

all training data is available. Such a procedure is often useful to update a

model after an initial nonadaptive training. Those methods are discussed in

detail in Chap. 4.

A variant of adaptive training algorithms consists in updating the para-

meters after reception of a block of data (“block training”): then the partial

cost is not related to a single example but to a block of examples.

The most popular adaptive training technique is called stochastic gradient,

whereby the parameter updates are proportional to the gradient of the partial

cost,

J k ( w k ) ,

where w k is the value of the vector of parameters after iteration k , i.e., after

updating the parameters from example k . Note that the LMS algorithm, dis-

cussed in the framework of the training of linear models, is a particular case

of stochastic gradient.

Some empirical results suggest that the stochastic gradient method avoids

local minima more e ciently than simple gradient descent in batch learning.

An alternative technique, stemming from adaptive filtering, can be used

for neural network training: the extended Kalman filter [Puskorius et al. 1994].

It is more e cient than stochastic gradient in terms of convergence speed, but

the number of operations per iteration is higher. That approach is described

in detail in Chap. 4.

w k +1 = w k −

µ k ∇

2.5.4 Training with Regularization

As stated in Chap. 1, the objective of black-box modeling is the design of a

model that is complex enough to learn the training data, but does not exhibit

overfitting, i.e., does not adjust to noise. Two categories of strategies can be

used.

•

Passive techniques: several models, of different complexities, are trained

as indicated in the previous section, and a selection between those models

is performed after training, in order to discard models that exhibit over-

fitting; that is done by cross-validation or statistical tests as explained in

the next section.

Search WWH ::

Custom Search

Home