Information Technology Reference
In-Depth Information
the terminal state from any state. Such stationary policies are called proper
stationary policies. Therefore, the terminal state is the equilibrium state (ac-
tually deterministic) of the controlled Markov chain that is associated to any
proper stationary policy.
For infinite horizon problems, because elementary costs are stationary and
without any terminal cost, there is no point in looking for an optimal non-
stationary policy. For a given state, the optimal action does not depend on
current time.
We set that elementary cost of the identical transition from terminal state
to zero, and the elementary cost of any other transition to strictly positive
values; therefore, the latter is lower bounded by a strictly positive constant
since state space is finite.
The expected total cost of a stationary policy π is defined by
J π ( x ) =
lim
N→∞
P π ( x,x 1 ) P π ( x 1 ,x 2 ) ...P π ( x N− 1 ,x N )
( x 1 ,...x N ) ∈E N
c ( x,π ( x ) ,x 1 )+ N− 1
c ( x k ( x k ) ,x k +1 ) .
×
k =1
This may be written more formally using probability theory notation
c ( x,π ( x ) ,X 1 )+
c ( X k ( X k ) ,X k +1 ) ,
J π ( x )= E P π,x
k =1
where P π,x is the probability distribution of the Markov chain that is associ-
ated to the stationary policy π and the initial state x .
One infers that if a stationary policy is not proper, there exists at least
one initial state such that the expected total cost from that state is infinite.
The shortest stochastic path problem consists in finding optimal policy π
that minimizes the cost function J π .
5.3.3.4 Infinite Horizon Problem with Discounted Cost
Arealnumber α strictly between 0 and 1 is given. It is called the discount
rate.
To any stationary policy π and any discount rate α we associate a cost
function J π from E into R . It takes the state x as input and returns the
expected cost from initial state x for the Markov chain with transition prob-
ability kernel P π .
J π ( x ) = lim
N→∞
P π ( x,x 1 ) P π ( x 1 ,x 2 ) ...P π ( x N− 1 ,x N )
( x 1 ,...x N ) ∈E N
c ( x,π ( x ) ,x 1 )+ N− 1
α k c ( x k ( x k ) ,x k +1 ) .
×
k =1
Search WWH ::




Custom Search