Database Reference
In-Depth Information
10.1 Markov-k-Processes and Augmented State Spaces
The notion of a
k
-Markov decision process (
k
-MDP) is a generalization of that of an
MDP. The generalized framework enables to formulate control problems which
involve environments where transition probabilities, rewards, and policies depend
on the sequence of the
k
most recently visited states rather than only the current one.
We shall refer to this assumption as the
generalized Markov property
.
Assumption 10.1 (Generalized Markov property): In each state, the choice of
the optimal action depends exclusively on the
k
most recently visited states.
Hence, given state and action spaces
S
and
A
, the dynamics are characterized by
transition probabilities of the form
,
s
k
,
s
0
p
s
1
,
...
,
s
k
,
s
0
,
s
1
,
...
S
,
a
A
,
∈
∈
which translates into plain English as “The probability of a transition to state
s
0
given action
a
, current state
s
k
and previous states
s
1
,
...
,
s
k
1
.” Similarly, policies
and the reward function are of the form
r
s
1
,
...
,
s
k
,
s
0
and
π
ð
s
1
; ...;
s
k
;
a
Þ
,
respectively. It is worth stressing that by assigning
k
:
¼
1, we recover the classical
MDP framework.
It comes as a bit of a surprise that in some theoretical sense, there is no
distinction between the classical and the
k
-MDP case. Given a
k
-MDP, it is always
possible to devise an equivalent MDP by means of a construction which we shall
refer to as
state space augmentation
. (The precise meaning of the term
equivalent
will become clear in the course of the subsequent discussion.)
Given the state space
S
of some
k
-MDP
M
, we define the corresponding
augmented state space
of order
k
by
k
j¼
1
s
j
S
:¼ [
:
as follows: for
Upon this state space, we model an MDP
M
:¼ S
;
A
; e
p
;e
r
s :
¼
(
s
1
,
...
,
s
l
), let