Closed-Loop Control Learning - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

algorithms have been used e ciently to solve large-size problems such as the

backgammon game, the lift planning problem or the dynamic assignment of

radio frequencies.

5.4.4.2 Reinforcement Learning with Sampling of a Continuous

State-Space

The implementation of approximate reinforcement learning when the value

functions are regular suggests that it could be useful for building an approx-

imate optimal control law of a nonlinear continuous system; that topic was

addressed at the beginning of this chapter as a direct application of supervised

learning and model inversion. Actually, Bellman equation is just a discrete

version of the Hamilton-Bellman-Jacobi equation (HBJ equation), which is

known to be the variational equation of optimal control with continuous state

space and time.

We just saw that the implementation of reinforcement learning in large-

size discrete problems is computationally demanding. Therefore, when using

that methodology for continuous control problems, one faces the following

dilemma:

•

A coarse sampling of state space or of the feasible state-action set leads to

an inaccurate approximation of the value function, to losing the Markov

property of the problem, and possibly to designing a control law that is

far from the optimal.

•

A fine sampling leads to the combinatorial explosion of the computation

complexity.

In order to overcome that di culty, specific sampling schemes are proposed

in the literature. One can use variable sampling steps. In autonomous robot-

ics for instance, space sampling will be fine in key locations (crossings, am-

biguous perceptions), where immediate reactions are necessary (new obstacle

avoidance), but space sampling will be rough in most regions where optimal

navigation is just routine. If the problem allows a multiscale sampling, it may

be e cient to determine an optimal policy.

5.4.4.3 Q-Learning in a Continuous Space

Let us consider the following controlled dynamical system with continuous

state space and continuous time.

d x

d t = f ( x,u ) .

(A deterministic system is considered to make notation shorter and simpler.)

The elementary cost c ( x,u ) is associated to the feasible state-action couple

( x,u ). That function allows defining the total cost as an integral functional

that depends on the state-action trajectory,

Search WWH ::

Custom Search

Home