Closed-Loop Control Learning - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

P π ( x,y )= P π ( x ) ( x,y ) .

The elementary cost is an application c from A

E into R ;the terminal

cost is an application C from E into R .

5.3.3.2 Finite Horizon Markov Decision Problem

A strictly positive integer N called horizon is given. It is the number of au-

thorized transitions. The problem is to find a minimal cost trajectory from

time 0 to time N .

To any policy π and any horizon N , we associate the cost function J 0 ,N

from E into R . That cost function takes the state x as input and returns the

expected cost of the random N -trajectory from initial state x according to

the law of the π -controlled Markov chain. J 0 ,N

is computed by the following

relation:

J 0 ,N

( x )=

P π ( x, 0) ( x,x 1 ) P π ( x 1 , 1) ( x 1 ,x 2 )

···

P π ( x N− 1 ,N− 1)

( x k +1 ,...x N ) ∈E N−k

( x N− 1 ,x N ) c ( x,π ( x, 0) ,x 1 )

+ N− 1

c ( x k ,π ( x k ,k ) ,x k +1 )+ C ( x N ) .

k =1

More generally, the cost J k,N

is defined from time k according to

J k,N

( x )=

P π ( x,k ) ( x,x k +1 ) P π ( x k +1 ,k +1 )( x k +1 ,x k +2 )

···

( x k +1 ,...x N ) ∈E N−k

P π ( x N− 1 ,N− 1) ( x N− 1 ,x N ) c ( x,π ( x,k ) ,x k +1 )

N−

c ( x k ,π ( x k ,k ) ,x k +1 )+ C ( x N ) .

k = k +1

Finite Horizon Markov Decision Problem

The N -horizon Markov decision problem consists in finding the optimal policy

π ∗ that minimizes the cost function J 0 ,N

5.3.3.3 Shortest Stochastic Path Problem

The problem that consists in reaching a target state x ∗ as fast as possible

is called the shortest stochastic path problem [Bertsekas et al. 1996]. In that

kind of problem, there is a single terminal state, denoted by x ∗ , such that,

for any feasible action, the only possible transition from that state is the

identical transition x ∗ →

x ∗ . In addition, it is assumed that there exists at

least one stationary policy, which enables to reach, with non-zero probability,

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home