Information Technology Reference
In-Depth Information
P
π
(
x,y
)=
P
π
(
x
)
(
x,y
)
.
The
elementary cost
is an application
c
from
A
×
E
into
R
;the
terminal
cost
is an application
C
from
E
into
R
.
5.3.3.2 Finite Horizon Markov Decision Problem
A strictly positive integer
N
called horizon is given. It is the number of au-
thorized transitions. The problem is to find a minimal cost trajectory from
time 0 to time
N
.
To any policy
π
and any horizon
N
, we associate the cost function
J
0
,N
π
from
E
into
R
. That cost function takes the state
x
as input and returns the
expected cost of the random
N
-trajectory from initial state
x
according to
the law of the
π
-controlled Markov chain.
J
0
,N
π
is computed by the following
relation:
J
0
,N
π
(
x
)=
P
π
(
x,
0)
(
x,x
1
)
P
π
(
x
1
,
1)
(
x
1
,x
2
)
···
P
π
(
x
N−
1
,N−
1)
(
x
k
+1
,...x
N
)
∈E
N−k
×
(
x
N−
1
,x
N
)
c
(
x,π
(
x,
0)
,x
1
)
+
N−
1
c
(
x
k
,π
(
x
k
,k
)
,x
k
+1
)+
C
(
x
N
)
.
k
=1
More generally, the cost
J
k,N
π
is defined from time
k
according to
J
k,N
π
(
x
)=
P
π
(
x,k
)
(
x,x
k
+1
)
P
π
(
x
k
+1
,k
+1
)(
x
k
+1
,x
k
+2
)
···
(
x
k
+1
,...x
N
)
∈E
N−k
×
P
π
(
x
N−
1
,N−
1)
(
x
N−
1
,x
N
)
c
(
x,π
(
x,k
)
,x
k
+1
)
N−
1
c
(
x
k
,π
(
x
k
,k
)
,x
k
+1
)+
C
(
x
N
)
.
+
k
=
k
+1
Finite Horizon Markov Decision Problem
The
N
-horizon
Markov decision problem
consists in finding the optimal policy
π
∗
that minimizes the cost function
J
0
,N
π
.
5.3.3.3 Shortest Stochastic Path Problem
The problem that consists in reaching a target state
x
∗
as fast as possible
is called the shortest stochastic path problem [Bertsekas et al. 1996]. In that
kind of problem, there is a single terminal state, denoted by
x
∗
, such that,
for any feasible action, the only possible transition from that state is the
identical transition
x
∗
→
x
∗
. In addition, it is assumed that there exists at
least one stationary policy, which enables to reach, with non-zero probability,
Search WWH ::
Custom Search