Modeling with Neural Networks: Principles and Model Design Methodology - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

make the model satisfactory, with respect to a criterion that will be discussed

below.

The basic principles of parameter estimation are the following:

y p }

•

Asetof N measurements

( k =1to N ) of the quantity to be mea-

sured is available, which corresponds to N values of the inputs

{

x k }

{

=

[ x 1 ,...,x q ]

{

}

( k =1to N ). That set of observations is called training set.

•

Because the training set is of finite size, the exact regression function

cannot be derived; therefore, an approximation of the regression func-

tion is sought, within a family of functions that are deemed complex

enough to account for the complexity of the data. The most reason-

able approach consists in first trying to find an approximation of the

regression function in the family of linear or a ne functions (i.e., per-

form linear regression ). In that case, the model is sought under the form

g ( x , w )= x T w = i =1 w i x i ; if the result of that model is not satisfac-

tory, an approximation of the regression function must be sought in a more

complex family of functions, either linear with respect to the parameters

(polynomials, Gaussians with fixed centers and covariances, wavelets with

fixed centers and dilations), or nonlinear with respect to the parameters

(neural networks, Gaussians with adjustable centers and covariance matri-

ces, etc.). If necessary, the complexity of the family of models is increased

step by step, by increasing the degree of the polynomial, the number of

Gaussians, the number of hidden neurons, etc.

•

For a given family of functions, the values of the parameters w must

be computed; this is done by minimizing a cost function that pictures

the “distance” between the predictions of the model and the measured

values. For each observation k of the training set, the residual is de-

fined as r k

g ( x k , w ), where y p is the k th measured value of

the process output, and where x k is the k th measured value of the in-

put vector. The least squares cost function , as defined in Chap. 1, is

the sum of the squared residuals of all observations of the training set:

J ( w )= k =1 ( y p −

= y p −

g ( x k , w )) 2 = r T r, where r is the vector of residuals, of

dimension N , whose components are the residuals r k . If the modeling were

perfect, the residual vector would be equal to zero, which is the absolute

minimum of the cost function. However, since measurements have noise, it

is not desirable to find a model that is so complex that the minimum of the

cost function would be equal to zero: such a model would reproduce the

noise, in addition to reproducing the deterministic behavior of the process,

whereas the purpose of modeling is to find a model that captures the de-

terministic part of the process and filters out the noise. Since there is no

point in finding a model whose predictions would be more accurate than

the measurements from which it is designed, the model designer will not

try to find a model with zero cost function, nor even the absolute minimum

of the cost function in a given family of models: a model will be sought,

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home