Information Technology Reference
In-Depth Information
When modeling that example, it is more natural to take a unit cost for
each (state-action) couple and to choose a terminal cost equal to
A for the
target 35 and to A for any other terminal state. Thus, one gets the following
values for the previous trajectories:
J N ( ω 1 )=10+ A, J N ( ω 2 )=10+ A,
J N ( ω 3 )=10+ A, J N ( ω 4 )=10
A....
Unfortunately, the horizon is not precisely known in most applications. There-
fore, infinite horizon problems are often considered. In those problems, it is
not always possible to define the total cost as the sum of elementary costs
for each step. Actually, the sequence of elementary costs may diverge for an
infinite number of steps. There are several possible ways of defining the total
cost of an infinite length trajectory.
One can define, when it exists, the limit of the average cost on the N
first steps of the trajectory when N goes to infinity. In our simple problem,
that solution does not make sense: it amounts to assign the cost
A to each
trajectory ending on the desired target, and the cost 1 to any other trajectory.
The trajectories that reach the desired target quickly are not favored.
When the problem is to reach a specified target within a finite number of
steps, it is possible to take the effective sum of the elementary costs as the
total cost just as in finite horizon problems. It is the case in our example.
In the general case, we suggest taking the discounted cost as a global
criterion to minimize. That cost function is inspired from economics, where
future costs are discounted by a discount rate α here 0 <α< 1.
Thus, in our problem, for an infinite horizon model, we have
1
J α ( ω 1 )= J α ( ω 2 )= J α ( ω 3 )=1+ α + α 2 +
···
=
1
α
3
1
J α ( ω 4) = 1 + α + α 2
3
4
=1+ α + α 2
−···
α .
Thus, shorter trajectories are favored.
The problem consists in finding an optimal policy π such that the total
cost of the state-action trajectory associated to that policy is minimal for any
initial state.
5.3.2 Example of a Markov Decision Problem
A Markov decision problem (MDP) is the generalization to a stochastic frame-
work of the previous problem. Randomness is introduced in the state evolution
model and in costs. The total cost is then a random variable
.Wehaveto
minimize a functional, which is just the mathematical expectation 1
J
of that
random variable J = E ( J ).
1 Let us recall that the mathematical expectation of a random variable is the av-
erage of this random variable for its probability distribution. As dynamics is
considered here, the probability law is defined on the set of trajectories.
 
Search WWH ::




Custom Search