Modeling with Neural Networks: Principles and Model Design Methodology - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

whose prediction error is on the order of the accuracy of the measurements.

That crucial problem is discussed below, in the section devoted to model

selection.

Empirical vs. Theoretical Cost Functions

The cost function J ( w ) is sometimes called empirical cost function , as opposed

to the theoretical cost function ( y p ( x )

g ( x , w )) 2 p ( x ) d x ; the latter is the

quantity that one would actually like to minimize, but it can obviously not

be computed.

−

Global Minima and Local Minima

If the model is linear with respect to its parameters, the least squares cost

function is quadratic with respect to them. If the model is not linear with

respect to the parameters (e.g., a neural network), then the least squares

cost function has several minima, one of which must be selected. This makes

the model selection problem somewhat more complicated than in the case of

models that are linear with respect to the parameters: that is the price to be

paid for taking advantage of parsimony, which is an asset of models that are

not linear with respect to their parameters.

The methods that can be used for minimizing the cost function fall into

two categories:

•

nonadaptive training, also called batch training or off-line training ,whereby

the cost function that is minimized takes into account all elements of the

training set (as is the case for the least squares cost function defined above);

such methods require that all elements of the training set be available when

training starts;

•

adaptive training, also called on-line training , whereby the parameters of

the model are updated sequentially as a function of a partial cost related

to each example k 4 : J k ( w )=( y p −

g ( x k , w )) 2 . Such techniques are useful

when new examples become available while training is already taking place.

Adaptive training can be performed even if all examples are available before

training starts, whereas a nonadaptive technique cannot be used if all exam-

ples are not available. In practice, the following strategy is frequently used:

the model is first trained nonadaptively, then it is updated by adaptive train-

ing during its operation, for instance to adapt the model to slow drifts of the

parameters of the process (due to wear, ageing, etc.).

In the following, the training of models that are linear with respect to their

parameters-the popular least squares method-will first be outlined. Then the

training (nonadaptive and adaptive) of models that are nonlinear with respect

4 The least squares cost function will also be called total cost, as opposed to the

partial cost.

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home