Neural Identification of Controlled Dynamical Systems and Recurrent Networks - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

4.3.4 General Recursive Prediction Error Method (RPEM)

The general recursive prediction error method is an application to the esti-

mation of stochastic approximation. We have just provided some examples

for linear identification. The general theory has been developed since the

fifties (Robbins and Monroe have done some pioneering work). A detailed

presentation is provided in [Kushner 1978]. It has been used for adaptive

neural network learning. Its advantage is to be recursive, so that the stor-

age of a large amount of data is not necessary. Its main drawback is its

slow convergence. To apply with full security the general method one has

to check a number of non-trivial assumptions. Precise convergence statements

are given in [Ljung 1983; Benveniste et al. 1987; Duflo 1996]. We will give a

detailed treatment of the particular case of the NARX( p,r ) model identifica-

tion X ( k +1) = f [ X ( k ) ,...,X ( k

r +1)]

This model is relevant for neural networks. It is a Markov model when its

state representation is given as

−

p +1) ,V ( k +1) ,u ( k ) ,...,u ( k

−

X ( k +1)= f [ X ( k ) ,V ( k +1) , u ( k )] .

We assume that the model is stable and converges towards a unique stationary

regime.

Function f , as well as the state noise

, are unknown. Conversely,

we assume that the state X ( k ) is accurately determined at time k .Weare

looking for an adaptive nonlinear parametric identification scheme of the type

X ( k +1) = g [ X ( k ) , u ( k ) , w ], by minimization of the quadratic prediction

error. The prediction error is defined for the input-ouput data ( x, u, y )and

for a given value w of the vector parameter by: ϑ ( y, x, u, w )= y

{

V ( k )

}

g ( x, u, w ).

We must compute the value of the parameter w that minimizes the mean

quadratic prediction error,

−

J ( w )= 1

2 ] ,

2 E [

f ( x ,V, u )

−

g ( x, u, w )

where the mathematical expectation is taken over the probability law of the

state noise, and is then averaged over the stationary regime of the input vector

variable (state-control).

In order to apply the stochastic gradient method, one has to compute the

gradient of the function 1 / 2[ ϑ ( x, y, u, w )

2 ] with respect to w . That gradi-

ent is equal to −∂g/∂ w ( y, x, u, w ) ϑ ( x, y, u, w ). It will be denoted below as

G ( y, x, u, w ). Similarly, we will denote G ( k +1) = G [ X ( k +1), X ( k ), u ( k ),

w ( k )].

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home