Closed-Loop Control Learning - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

We may rewrite it as in the case of the shortest stochastic path:

c ( x,π ( x ) ,X 1 )+ ∞

J π ( x )= E P π,x

α k c ( X k ,π ( X k ) ,X k +1 )

.

k =1

The Infinite horizon Markov decision problem with discount rate α consists

in finding an optimal stationary policy, which minimizes the cost function J π .

In the next sections, when no ambiguity can arise, we will just denote by

N the finite horizon or by α the discount rate without further explanation. We

will omit the superscript in the cost function in order to alleviate notations.

A discount rate infinite horizon problem may be transformed into a short-

est stochastic path problem as follows: every elementary cost is shifted by an

appropriate constant to be made positive. Then an artificial terminal state is

added, and transitions are changed: any probability transition of the initial

problem is multiplied by α and the other possibility is to be sent into the

terminal state (cemetery state). All stationary policies of the initial problem

are proper stationary policies of the shortest stochastic path problem thus

defined. The costs of the two problems (infinite horizon with discount rate

and shortest stochastic path) are equal. This transformation is rather formal.

Thus, methods that are used to solve shortest stochastic path problems can

be used to solve more general infinite horizon problems with discount rate.

Similarly, given a shortest stochastic path problem, it can be transformed,

in practice, into an infinite horizon problem with average cost and discount

rate by random restart of the state as soon as the terminal state is reached.

5.3.4 Finite Horizon Dynamic Programming

5.3.4.1 Bellman's Optimality Principle

The previous definition of J 0 ,N

π

can be transformed into the following:

( x )=

x 1 ∈E

J 0 ,N

π

P π ( x, 0) ( x,x 1 )

⎡

⎣ c ( x,π ( x, 0) ,x 1 )+

( x 2 ,...x N )

×

P π ( x 1 , 1) ( x 1 ,x 2 ) ...P π ( x N− 1 ,N− 1)

c ( x k ,π ( x k ,k ) ,x k +1 )+ C ( x N ) ,

( x N− 1 ,x N ) N− 1

×

k =1

which amounts to

( x )=

x 1 ∈E

J 0 , 1

π

P π ( x, 0) ( x,x 1 )[ c ( x,π ( x, 1) ,x 1 )+ J 1 ,N

( x 1 )]

π

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home