Modeling with Neural Networks: Principles and Model Design Methodology - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

function related to example k ,

J k ( w )= y p −

g ( ζ k , w ) 2 ,

where w k denotes the value of the vector of parameters after iteration k , i.e.,

after the parameter update related to example k . The algorithm is

w k +1 = w k + µ k ( y p −

( x k +1 ) T w k ) x k +1 ,

where µ k is a sequence of positive numbers (for instance µ k = constant or

µ k =1 / ( α + βk )). Note that y pk − ( x k +1 ) T w k is the modeling error made

on the new example x k +1 when the model has the parameters computed at

iteration k . Hence, for each example, the weight update is proportional to the

modeling error on that example.

It can be shown that, under conditions that will not be described here, the

LMS algorithm converges to the minimum of the total least squares cost func-

tion. The adaptive training of linear models is described in detail in Chap. 4.

2.5.2 Nonadaptive (Batch) Training of Static Models that Are Not

Linear with Respect to Their Parameters

The present section is devoted to the batch training of models that are not

linear with respect to their parameters, such as feedforward neural networks.

Since the model g ( x , w ) is not linear with respect to its parameters, the

cost function J ( w )= k =1 ( y p −

g ( x k , w )) 2 is not quadratic with respect

to the parameters. Hence the gradient of the cost function is not linear, so

that the least squares solution cannot be found as the solution of a linear

system. Therefore, the ordinary least squares techniques are useless, and one

has to resort to more elaborate minimization techniques, which update the

parameters iteratively as a function of the gradient of the cost function with

respect to the parameters.

Just as for linear models, training can be performed either adaptively

or nonadaptively. Therefore, each training iteration (or epoch )requirestwo

ingredients

•

the computation of the gradient of the cost function,

•

the updating of the parameters as a function of that gradient, in order to

get closer to a minimum of the cost function.

Those two points are discussed in the following. As a preliminary, however,

we consider the normalization of the inputs.

2.5.2.1 Input Normalization

Prior to training, the input variables must be normalized and centered: if the

inputs have very different orders of magnitude, the smallest ones will not be

Search WWH ::

Custom Search

Home