Information Technology Reference
In-Depth Information
P π ( x,y )= P π ( x ) ( x,y ) .
The elementary cost is an application c from A
×
E into R ;the terminal
cost is an application C from E into R .
5.3.3.2 Finite Horizon Markov Decision Problem
A strictly positive integer N called horizon is given. It is the number of au-
thorized transitions. The problem is to find a minimal cost trajectory from
time 0 to time N .
To any policy π and any horizon N , we associate the cost function J 0 ,N
π
from E into R . That cost function takes the state x as input and returns the
expected cost of the random N -trajectory from initial state x according to
the law of the π -controlled Markov chain. J 0 ,N
π
is computed by the following
relation:
J 0 ,N
π
( x )=
P π ( x, 0) ( x,x 1 ) P π ( x 1 , 1) ( x 1 ,x 2 )
···
P π ( x N− 1 ,N− 1)
( x k +1 ,...x N ) ∈E N−k
×
( x N− 1 ,x N ) c ( x,π ( x, 0) ,x 1 )
+ N− 1
c ( x k ( x k ,k ) ,x k +1 )+ C ( x N ) .
k =1
More generally, the cost J k,N
π
is defined from time k according to
J k,N
π
( x )=
P π ( x,k ) ( x,x k +1 ) P π ( x k +1 ,k +1 )( x k +1 ,x k +2 )
···
( x k +1 ,...x N ) ∈E N−k
×
P π ( x N− 1 ,N− 1) ( x N− 1 ,x N ) c ( x,π ( x,k ) ,x k +1 )
N−
1
c ( x k ( x k ,k ) ,x k +1 )+ C ( x N ) .
+
k = k +1
Finite Horizon Markov Decision Problem
The N -horizon Markov decision problem consists in finding the optimal policy
π that minimizes the cost function J 0 ,N
π
.
5.3.3.3 Shortest Stochastic Path Problem
The problem that consists in reaching a target state x as fast as possible
is called the shortest stochastic path problem [Bertsekas et al. 1996]. In that
kind of problem, there is a single terminal state, denoted by x , such that,
for any feasible action, the only possible transition from that state is the
identical transition x
x . In addition, it is assumed that there exists at
least one stationary policy, which enables to reach, with non-zero probability,
Search WWH ::




Custom Search