Information Technology Reference
In-Depth Information
Note that if modeling were perfect, i.e., if the output of the model g ( x , w )
were identical to the regression function, and if the number of observations of
the training set and of the validation set were very large, then both the TMSE
and VMSE would be equal to the standard deviation of the measurement noise
(provided NT and NV
1). Therefore, the goal of modeling from examples
can be expressed as follows: find the most parsimonious model (e.g., the most
parsimonious feedforward neural network) whose TMSE and VMSE are on
the same order of magnitude, and are as small as possible, i.e., on the order
of magnitude of the standard deviation of the noise.
What to Do in Practice?
The purpose of this topic is to provide practical methodologies, founded on
sound theoretical bases, for model design through training, whether super-
vised or unsupervised. A complete methodology for supervised training will
be described in Chap. 2 (together with complements in Chap. 3), a method-
ology for unsupervised training will be described in Chap. 7.
1.2.2.4 The Training of Feedforward Neural Networks: An
Optimization Problem
Once the complexity of the model, i.e., the number of hidden neurons of a
feedforward neural network, is chosen, training can be performed: one has
to estimate the parameters of the neural network that, given the number of
parameters that are available to him, has a minimum mean square error on
the training set. Therefore, training is a numerical optimization problem .
For simplicity, we consider a model with a single output g ( x , w ). The
training set contains N examples. The least squares cost function was defined
above as
N
y p ( x k )
g ( x k , w ) 2 ,
J ( w )= 1
2
k =1
where x k is the vector of the values of the variables for example k,y p ( x k )
is the corresponding measured value of the quantity to be modeled, w is
the vector of the parameters (or weights) of the model, and g ( x k , w ) is the
output value of the model with parameters w for the vector of variables x k .
Therefore, the cost function is a function of all adjustable parameters w of the
model. Training consists in finding the parameter vector w for which J ( w )is
minimum.
For a model that is linear with respect to its parameters (e.g., radial ba-
sis functions with fixed centers and widths, polynomials, etc.), the cost
function J is quadratic with respect to the parameters: the ordinary least
squares methods can be used. They are simple and e cient. However, the
resulting models are not parsimonious.
Search WWH ::




Custom Search