Closed-Loop Control Learning - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

5.3 Dynamic Programming and Optimal Control

5.3.1 Example of a Deterministic Problem in a Discrete State

Space

Let us return to the simple example of controlled dynamical system that is

shown on Fig. 4.1 of previous chapter. That example was described at the

beginning of the section Formal definition and examples of discrete time

controlled dynamical systems . In order to define a control problem, we have

to define the criterion as a cost function to minimize. In the considered ex-

ample, it is possible to choose a location in the labyrinth as a target to reach

as soon as possible. In that case, we will associate to each triple (current

state, current action, next state) a unit cost, except for the triple whose next

state is state 35 (the target): that triple will enjoy a high negative cost

−

A

( reward ).

The problem of optimal control consists in designing a closed-loop control

law. In the context of operational research for discrete time and discrete state

space, the terms policy, or strategy are preferred. It is a function from state

space E to the control set (or action set) A , which associates an action to

each current state. A couple, which consists in one state and one action that

can be carried out from that state, is called a feasible (state-action) couple.

Actually, for finite horizon problems, it is natural to consider nonstationary

policies: if we are traveling in a dangerous country at the beginning of the day,

we surely choose to advance as quickly as possible. Conversely, at the end of

the day, we rather choose to move towards a safe place to spend the night. In

a given location, the two directions are generally not the same. Therefore, in

finite horizon problems, nonstationary policies must be considered, which are

functions of the current time and of the current state and which take their

values in the set of feasible actions.

Fig. 5.11. Building up a closed-loop control law using recurrent back-propagation

through an Elman network

Search WWH ::

Custom Search

Home