Information Technology Reference
In-Depth Information
J =
0
e −αt c [ x ( t ) ,u ( t )]d t.
A stationary policy π defines an autonomous dynamical system d x/ d t =
f ( x,π ( x )).
To value policy π , one must compute the state function
J π ( x )=
0
e −αt c [ x ( t ) ( x ( t ))]d t ;
the integral is computed on the trajectory of the autonomous dynamical sys-
tem originating from the initial state x .
Therefore, a stationary optimal policy π follows the variational equation:
c ( x,u )+
x ( J π ) d x
d t
π ( x ) = Arg min
u/ ( x,u )
A
=Argmin
u/ ( x,u )
[ c ( x,u )+
x ( J π ) f ( x,t )] .
A
That equation is exactly the HBJ equation of the control problem. When
a neural network approximates the total cost of a policy π , the latter may
compute the gradient of the cost function
x ( J π ), which can be plugged into
the previous formula. Thus, it is possible to infer a training algorithm of the
continuous value function Q that is defined by
Q ( x,u )= c ( x,u )+
x ( J π ) f ( x,t )
and to use it within a generalized continuous Q-learning algorithm.
Recent publications investigate systematically the implementation of re-
inforcement learning to learn an optimal control law when the model is not
known. See for instance [Bertsekas et al. 1996] for a general introduction. More
recently, [Doya 2000] presents a nice derivation of several reinforcement learn-
ing algorithms in the continuous framework and test them using the inverted
pendulum problem as a benchmark.
References
1. Anderson B.D.O., Moore J.B. [1979], Optimal Filtering , Prentice Hall
2. Azencott R., Dacunha-Castelle D. [1984], Series d'observations irregulieres.
Modelisation et prevision , Masson
3. Barto A.G., Sutton R.S., Anderson C.W. [1983], Neuron-like elements than can
solve di cult learning control problemes, IEEE Trans. On Systems, Man and
Cybernetics , 13, pp 835-846
4. Benveniste A., Metivier M., Priouret P. [1987], Algorithmes adaptatifs et approx-
imations stochastiques. Theorie et application a l'identification, au traitement
du signal et a la reconnaissance des formes , Masson
Search WWH ::




Custom Search