Information Technology Reference
In-Depth Information
We may rewrite it as in the case of the shortest stochastic path:
c ( x,π ( x ) ,X 1 )+
J π ( x )= E P π,x
α k c ( X k ( X k ) ,X k +1 )
.
k =1
The Infinite horizon Markov decision problem with discount rate α consists
in finding an optimal stationary policy, which minimizes the cost function J π .
In the next sections, when no ambiguity can arise, we will just denote by
N the finite horizon or by α the discount rate without further explanation. We
will omit the superscript in the cost function in order to alleviate notations.
A discount rate infinite horizon problem may be transformed into a short-
est stochastic path problem as follows: every elementary cost is shifted by an
appropriate constant to be made positive. Then an artificial terminal state is
added, and transitions are changed: any probability transition of the initial
problem is multiplied by α and the other possibility is to be sent into the
terminal state (cemetery state). All stationary policies of the initial problem
are proper stationary policies of the shortest stochastic path problem thus
defined. The costs of the two problems (infinite horizon with discount rate
and shortest stochastic path) are equal. This transformation is rather formal.
Thus, methods that are used to solve shortest stochastic path problems can
be used to solve more general infinite horizon problems with discount rate.
Similarly, given a shortest stochastic path problem, it can be transformed,
in practice, into an infinite horizon problem with average cost and discount
rate by random restart of the state as soon as the terminal state is reached.
5.3.4 Finite Horizon Dynamic Programming
5.3.4.1 Bellman's Optimality Principle
The previous definition of J 0 ,N
π
can be transformed into the following:
( x )=
x 1 ∈E
J 0 ,N
π
P π ( x, 0) ( x,x 1 )
c ( x,π ( x, 0) ,x 1 )+
( x 2 ,...x N )
×
P π ( x 1 , 1) ( x 1 ,x 2 ) ...P π ( x N− 1 ,N− 1)
c ( x k ( x k ,k ) ,x k +1 )+ C ( x N ) ,
( x N− 1 ,x N ) N− 1
×
k =1
which amounts to
( x )=
x 1 ∈E
J 0 , 1
π
P π ( x, 0) ( x,x 1 )[ c ( x,π ( x, 1) ,x 1 )+ J 1 ,N
( x 1 )]
π
Search WWH ::




Custom Search