Closed-Loop Control Learning - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

J = ∞

0

e −αt c [ x ( t ) ,u ( t )]d t.

A stationary policy π defines an autonomous dynamical system d x/ d t =

f ( x,π ( x )).

To value policy π , one must compute the state function

J π ( x )= ∞

0

e −αt c [ x ( t ) ,π ( x ( t ))]d t ;

the integral is computed on the trajectory of the autonomous dynamical sys-

tem originating from the initial state x .

Therefore, a stationary optimal policy π ∗ follows the variational equation:

c ( x,u )+

∇ x ( J π ∗ ) d x

d t

π ∗ ( x ) = Arg min

u/ ( x,u )

∈ A

=Argmin

u/ ( x,u )

[ c ( x,u )+

∇ x ( J π ∗ ) f ( x,t )] .

∈ A

That equation is exactly the HBJ equation of the control problem. When

a neural network approximates the total cost of a policy π , the latter may

compute the gradient of the cost function

∇ x ( J π ∗ ), which can be plugged into

the previous formula. Thus, it is possible to infer a training algorithm of the

continuous value function Q that is defined by

Q ( x,u )= c ( x,u )+

∇ x ( J π ∗ ) f ( x,t )

and to use it within a generalized continuous Q-learning algorithm.

Recent publications investigate systematically the implementation of re-

inforcement learning to learn an optimal control law when the model is not

known. See for instance [Bertsekas et al. 1996] for a general introduction. More

recently, [Doya 2000] presents a nice derivation of several reinforcement learn-

ing algorithms in the continuous framework and test them using the inverted

pendulum problem as a benchmark.

References

1. Anderson B.D.O., Moore J.B. [1979], Optimal Filtering , Prentice Hall

2. Azencott R., Dacunha-Castelle D. [1984], Series d'observations irregulieres.

Modelisation et prevision , Masson

3. Barto A.G., Sutton R.S., Anderson C.W. [1983], Neuron-like elements than can

solve di cult learning control problemes, IEEE Trans. On Systems, Man and

Cybernetics , 13, pp 835-846

4. Benveniste A., Metivier M., Priouret P. [1987], Algorithmes adaptatifs et approx-

imations stochastiques. Theorie et application a l'identification, au traitement

du signal et a la reconnaissance des formes , Masson

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home