Information Technology Reference
In-Depth Information
5.3.5 Infinite-Horizon Dynamic Programming with Discounted
Cost
5.3.5.1 Bellman's Optimality Principle
Similarly, the definition of J π
may be modified to get the following relation:
J α ( x )= E p π,x [ c ( x,π ( x ) ,X 1 )+ αJ π ( X 1 )] .
We express now the fact that the optimal policy π
is better then the nonsta-
tionary policy which consists in first implementing action a from initial state
x, and next keep the application of stationary optimal policy. Thus, we obtain
the following relation:
J π ( x )=
E P u ( x,. ) [ c (( x,u ) ,X 1 )+ αJ π ( X 1 )] ,
min
u/ ( x,u ) A
which expresses Bellman's optimality principle in the infinite horizon with
discount rate framework. We introduce a value function Q, which is deduced
from the cost function J , just as in the previously considered case of finite
horizon problem. The value function is defined on the set of feasible (state-
action) couples,
Q J ( x,u )=
y∈E
P u ( x,y )[ c ( x,u,y )+ αJ ( y )] .
Retaining that definition of the value function, Bellman's optimality principle
may be written in the simpler way: J ( x )=min u/ ( x,u ) A Q J ×
( x,u ). That
Bellman variational equation is a fixed-point equation that characterizes the
optimal cost function J π . It does not provide an immediate algorithm to
compute the optimal policy within a finite number of computation steps, as it
was the case for finite horizon problems and dynamic programming. On the
other hand, the following characterization theorem is easy to prove [Bertsekas
et al. 1996].
Theorem. The Bellman variational equation has a single solution. This
unique solution is the optimal cost function J π .
This theorem is proven using the mathematical technique of contraction .
That technique not only provides an existence and uniqueness mathematical
proof: it can be applied to prove the convergence towards solution of practical
algorithms. Those algorithms are iterative algorithms that will be described
in following sections. In order to alleviate notations, from now on, we omit
writing α as an index.
Search WWH ::




Custom Search