Information Technology Reference
In-Depth Information
algorithms have been used e ciently to solve large-size problems such as the
backgammon game, the lift planning problem or the dynamic assignment of
radio frequencies.
5.4.4.2 Reinforcement Learning with Sampling of a Continuous
State-Space
The implementation of approximate reinforcement learning when the value
functions are regular suggests that it could be useful for building an approx-
imate optimal control law of a nonlinear continuous system; that topic was
addressed at the beginning of this chapter as a direct application of supervised
learning and model inversion. Actually, Bellman equation is just a discrete
version of the Hamilton-Bellman-Jacobi equation (HBJ equation), which is
known to be the variational equation of optimal control with continuous state
space and time.
We just saw that the implementation of reinforcement learning in large-
size discrete problems is computationally demanding. Therefore, when using
that methodology for continuous control problems, one faces the following
dilemma:
A coarse sampling of state space or of the feasible state-action set leads to
an inaccurate approximation of the value function, to losing the Markov
property of the problem, and possibly to designing a control law that is
far from the optimal.
A fine sampling leads to the combinatorial explosion of the computation
complexity.
In order to overcome that di culty, specific sampling schemes are proposed
in the literature. One can use variable sampling steps. In autonomous robot-
ics for instance, space sampling will be fine in key locations (crossings, am-
biguous perceptions), where immediate reactions are necessary (new obstacle
avoidance), but space sampling will be rough in most regions where optimal
navigation is just routine. If the problem allows a multiscale sampling, it may
be e cient to determine an optimal policy.
5.4.4.3 Q-Learning in a Continuous Space
Let us consider the following controlled dynamical system with continuous
state space and continuous time.
d x
d t = f ( x,u ) .
(A deterministic system is considered to make notation shorter and simpler.)
The elementary cost c ( x,u ) is associated to the feasible state-action couple
( x,u ). That function allows defining the total cost as an integral functional
that depends on the state-action trajectory,
Search WWH ::




Custom Search