Neural Identification of Controlled Dynamical Systems and Recurrent Networks - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

a ) 2 ] is minimized. As the probability distribution is un-

known, it is not possible to compute its mathematical expectation. The gra-

dient of J (i.e., its first derivative) is

J ( a )=1 / 2 E [( X

−

∇

J ( a )= E ( X

−

a ). A gradient descent

algorithm is

a k +1 = a k −

γ k +1 ∇

J ( a k ) ,

where γ k +1 is a positive scalar.

Replacing the gradient

∇

J ( a k ) by the expression ( X k +1 −

a k ) yields the

empirical mean recursive estimate,

a k +1 = a k −

γ k +1 ( X k +1 −

a k ) .

Note that the expectation of the random variable ( X k +1 −

a k )is E ( X )

−

a k ,

which is exactly the value

J ( a k ) of the gradient of the cost function J taken

at the current value a k of the parameter estimate. Therefore, this algorithm

is termed stochastic gradient. In the gradient descent, the gradient of the cost

function has been replaced by a random variable, whose expectation is equal

to this gradient. The stochastic gradient is known at any time, whereas the

“total” gradient depends on the law of X , which is unknown. The stochastic

gradient algorithm has already been mentioned in Chap. 2. We will study it

more in detail in the following.

That algorithm performs the optimization task without prior estimation

of the unknown probability distribution. Actually, the optimization and the

estimation tasks are performed simultaneously. By contrast, in the classical

estimation process, the estimation phase is performed first. In that phase, the

criterion is estimated by the associated empirical criterion, here the estimate

of J ( a )=1 / 2 E [( X

∇

a ) 2 ] would be J N ( a )=1 / 2 N k =1 ( x k −

a ) 2 ; then,

the optimization phase is performed using that estimate. Actually, on that

example, the two approaches lead to the same result because the model is

linear with respect to the parameter of interest, namely a . Nevertheless, the

implementations of the two algorithms are different: the implementation of

the stochastic gradient is recursive.

−

4.3.2 Recursive Estimation of Linear Regression

The stochastic gradient approach that we have just used to estimate the em-

pirical mean is frequently used for linear and nonlinear regression. In the con-

text of linear regression, the algorithm is named recursive least mean squares

(LMS) or Widrow-Hoff algorithm. This algorithm is well known in signal the-

ory, where it is used to compute a linear regression in an adaptive way (see

also Chap. 2).

Consider the problem of the minimization of J ( w )=1 / 2 E [( Y

b ) 2 ]

with respect to w ,where X is a second-order random vector (1 ,n ) (i.e., its

expectation and covariance matrices are well defined). The vector parameter

−

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home