Information Technology Reference
In-Depth Information
function related to example k ,
J k ( w )= y p
g ( ζ k , w ) 2 ,
where w k denotes the value of the vector of parameters after iteration k , i.e.,
after the parameter update related to example k . The algorithm is
w k +1 = w k + µ k ( y p
( x k +1 ) T w k ) x k +1 ,
where µ k is a sequence of positive numbers (for instance µ k = constant or
µ k =1 / ( α + βk )). Note that y pk ( x k +1 ) T w k is the modeling error made
on the new example x k +1 when the model has the parameters computed at
iteration k . Hence, for each example, the weight update is proportional to the
modeling error on that example.
It can be shown that, under conditions that will not be described here, the
LMS algorithm converges to the minimum of the total least squares cost func-
tion. The adaptive training of linear models is described in detail in Chap. 4.
2.5.2 Nonadaptive (Batch) Training of Static Models that Are Not
Linear with Respect to Their Parameters
The present section is devoted to the batch training of models that are not
linear with respect to their parameters, such as feedforward neural networks.
Since the model g ( x , w ) is not linear with respect to its parameters, the
cost function J ( w )= k =1 ( y p
g ( x k , w )) 2 is not quadratic with respect
to the parameters. Hence the gradient of the cost function is not linear, so
that the least squares solution cannot be found as the solution of a linear
system. Therefore, the ordinary least squares techniques are useless, and one
has to resort to more elaborate minimization techniques, which update the
parameters iteratively as a function of the gradient of the cost function with
respect to the parameters.
Just as for linear models, training can be performed either adaptively
or nonadaptively. Therefore, each training iteration (or epoch )requirestwo
ingredients
the computation of the gradient of the cost function,
the updating of the parameters as a function of that gradient, in order to
get closer to a minimum of the cost function.
Those two points are discussed in the following. As a preliminary, however,
we consider the normalization of the inputs.
2.5.2.1 Input Normalization
Prior to training, the input variables must be normalized and centered: if the
inputs have very different orders of magnitude, the smallest ones will not be
Search WWH ::




Custom Search