Information Technology Reference
In-Depth Information
4.3.4 General Recursive Prediction Error Method (RPEM)
The general recursive prediction error method is an application to the esti-
mation of stochastic approximation. We have just provided some examples
for linear identification. The general theory has been developed since the
fifties (Robbins and Monroe have done some pioneering work). A detailed
presentation is provided in [Kushner 1978]. It has been used for adaptive
neural network learning. Its advantage is to be recursive, so that the stor-
age of a large amount of data is not necessary. Its main drawback is its
slow convergence. To apply with full security the general method one has
to check a number of non-trivial assumptions. Precise convergence statements
are given in [Ljung 1983; Benveniste et al. 1987; Duflo 1996]. We will give a
detailed treatment of the particular case of the NARX( p,r ) model identifica-
tion X ( k +1) = f [ X ( k ) ,...,X ( k
r +1)]
This model is relevant for neural networks. It is a Markov model when its
state representation is given as
p +1) ,V ( k +1) ,u ( k ) ,...,u ( k
X ( k +1)= f [ X ( k ) ,V ( k +1) , u ( k )] .
We assume that the model is stable and converges towards a unique stationary
regime.
Function f , as well as the state noise
, are unknown. Conversely,
we assume that the state X ( k ) is accurately determined at time k .Weare
looking for an adaptive nonlinear parametric identification scheme of the type
X ( k +1) = g [ X ( k ) , u ( k ) , w ], by minimization of the quadratic prediction
error. The prediction error is defined for the input-ouput data ( x, u, y )and
for a given value w of the vector parameter by: ϑ ( y, x, u, w )= y
{
V ( k )
}
g ( x, u, w ).
We must compute the value of the parameter w that minimizes the mean
quadratic prediction error,
J ( w )= 1
2 ] ,
2 E [
f ( x ,V, u )
g ( x, u, w )
where the mathematical expectation is taken over the probability law of the
state noise, and is then averaged over the stationary regime of the input vector
variable (state-control).
In order to apply the stochastic gradient method, one has to compute the
gradient of the function 1 / 2[ ϑ ( x, y, u, w )
2 ] with respect to w . That gradi-
ent is equal to −∂g/∂ w ( y, x, u, w ) ϑ ( x, y, u, w ). It will be denoted below as
G ( y, x, u, w ). Similarly, we will denote G ( k +1) = G [ X ( k +1), X ( k ), u ( k ),
w ( k )].
Search WWH ::




Custom Search