Information Technology Reference
In-Depth Information
make the model satisfactory, with respect to a criterion that will be discussed
below.
The basic principles of parameter estimation are the following:
y p }
Asetof N measurements
( k =1to N ) of the quantity to be mea-
sured is available, which corresponds to N values of the inputs
{
x k }
{
=
[ x 1 ,...,x q ]
{
}
( k =1to N ). That set of observations is called training set.
Because the training set is of finite size, the exact regression function
cannot be derived; therefore, an approximation of the regression func-
tion is sought, within a family of functions that are deemed complex
enough to account for the complexity of the data. The most reason-
able approach consists in first trying to find an approximation of the
regression function in the family of linear or a ne functions (i.e., per-
form linear regression ). In that case, the model is sought under the form
g ( x , w )= x T w = i =1 w i x i ; if the result of that model is not satisfac-
tory, an approximation of the regression function must be sought in a more
complex family of functions, either linear with respect to the parameters
(polynomials, Gaussians with fixed centers and covariances, wavelets with
fixed centers and dilations), or nonlinear with respect to the parameters
(neural networks, Gaussians with adjustable centers and covariance matri-
ces, etc.). If necessary, the complexity of the family of models is increased
step by step, by increasing the degree of the polynomial, the number of
Gaussians, the number of hidden neurons, etc.
For a given family of functions, the values of the parameters w must
be computed; this is done by minimizing a cost function that pictures
the “distance” between the predictions of the model and the measured
values. For each observation k of the training set, the residual is de-
fined as r k
g ( x k , w ), where y p is the k th measured value of
the process output, and where x k is the k th measured value of the in-
put vector. The least squares cost function , as defined in Chap. 1, is
the sum of the squared residuals of all observations of the training set:
J ( w )= k =1 ( y p
= y p
g ( x k , w )) 2 = r T r, where r is the vector of residuals, of
dimension N , whose components are the residuals r k . If the modeling were
perfect, the residual vector would be equal to zero, which is the absolute
minimum of the cost function. However, since measurements have noise, it
is not desirable to find a model that is so complex that the minimum of the
cost function would be equal to zero: such a model would reproduce the
noise, in addition to reproducing the deterministic behavior of the process,
whereas the purpose of modeling is to find a model that captures the de-
terministic part of the process and filters out the noise. Since there is no
point in finding a model whose predictions would be more accurate than
the measurements from which it is designed, the model designer will not
try to find a model with zero cost function, nor even the absolute minimum
of the cost function in a given family of models: a model will be sought,
Search WWH ::




Custom Search