Information Technology Reference
In-Depth Information
4.3.4 General Recursive Prediction Error Method (RPEM)
The general recursive prediction error method is an application to the esti-
mation of stochastic approximation. We have just provided some examples
for linear identification. The general theory has been developed since the
fifties (Robbins and Monroe have done some pioneering work). A detailed
presentation is provided in [Kushner 1978]. It has been used for adaptive
neural network learning. Its advantage is to be recursive, so that the stor-
age of a large amount of data is not necessary. Its main drawback is its
slow convergence. To apply with full security the general method one has
to check a number of non-trivial assumptions. Precise convergence statements
are given in [Ljung 1983; Benveniste et al. 1987; Duflo 1996]. We will give a
detailed treatment of the particular case of the NARX(
p,r
) model identifica-
tion
X
(
k
+1) =
f
[
X
(
k
)
,...,X
(
k
r
+1)]
This model is relevant for neural networks. It is a Markov model when its
state representation is given as
−
p
+1)
,V
(
k
+1)
,u
(
k
)
,...,u
(
k
−
X
(
k
+1)=
f
[
X
(
k
)
,V
(
k
+1)
,
u
(
k
)]
.
We assume that the model is stable and converges towards a unique stationary
regime.
Function
f
, as well as the state noise
, are unknown. Conversely,
we assume that the state
X
(
k
) is accurately determined at time
k
.Weare
looking for an adaptive nonlinear parametric identification scheme of the type
X
(
k
+1) =
g
[
X
(
k
)
,
u
(
k
)
,
w
], by minimization of the quadratic prediction
error. The prediction error is defined for the input-ouput data (
x, u, y
)and
for a given value
w
of the vector parameter by:
ϑ
(
y, x, u, w
)=
y
{
V
(
k
)
}
g
(
x, u, w
).
We must compute the value of the parameter
w
that minimizes the mean
quadratic prediction error,
−
J
(
w
)=
1
2
]
,
2
E
[
f
(
x
,V,
u
)
−
g
(
x, u, w
)
where the mathematical expectation is taken over the probability law of the
state noise, and is then averaged over the stationary regime of the input vector
variable (state-control).
In order to apply the stochastic gradient method, one has to compute the
gradient of the function 1
/
2[
ϑ
(
x, y, u, w
)
2
] with respect to
w
. That gradi-
ent is equal to
−∂g/∂
w
(
y, x, u, w
)
ϑ
(
x, y, u, w
). It will be denoted below as
G
(
y, x, u, w
). Similarly, we will denote
G
(
k
+1) =
G
[
X
(
k
+1),
X
(
k
),
u
(
k
),
w
(
k
)].
Search WWH ::
Custom Search