Information Technology Reference
In-Depth Information
whose prediction error is on the order of the accuracy of the measurements.
That crucial problem is discussed below, in the section devoted to model
selection.
Empirical vs. Theoretical Cost Functions
The cost function J ( w ) is sometimes called empirical cost function , as opposed
to the theoretical cost function ( y p ( x )
g ( x , w )) 2 p ( x ) d x ; the latter is the
quantity that one would actually like to minimize, but it can obviously not
be computed.
Global Minima and Local Minima
If the model is linear with respect to its parameters, the least squares cost
function is quadratic with respect to them. If the model is not linear with
respect to the parameters (e.g., a neural network), then the least squares
cost function has several minima, one of which must be selected. This makes
the model selection problem somewhat more complicated than in the case of
models that are linear with respect to the parameters: that is the price to be
paid for taking advantage of parsimony, which is an asset of models that are
not linear with respect to their parameters.
The methods that can be used for minimizing the cost function fall into
two categories:
nonadaptive training, also called batch training or off-line training ,whereby
the cost function that is minimized takes into account all elements of the
training set (as is the case for the least squares cost function defined above);
such methods require that all elements of the training set be available when
training starts;
adaptive training, also called on-line training , whereby the parameters of
the model are updated sequentially as a function of a partial cost related
to each example k 4 : J k ( w )=( y p
g ( x k , w )) 2 . Such techniques are useful
when new examples become available while training is already taking place.
Adaptive training can be performed even if all examples are available before
training starts, whereas a nonadaptive technique cannot be used if all exam-
ples are not available. In practice, the following strategy is frequently used:
the model is first trained nonadaptively, then it is updated by adaptive train-
ing during its operation, for instance to adapt the model to slow drifts of the
parameters of the process (due to wear, ageing, etc.).
In the following, the training of models that are linear with respect to their
parameters-the popular least squares method-will first be outlined. Then the
training (nonadaptive and adaptive) of models that are nonlinear with respect
4 The least squares cost function will also be called total cost, as opposed to the
partial cost.
Search WWH ::




Custom Search