Database Reference
In-Depth Information
10.1 Markov-k-Processes and Augmented State Spaces
The notion of a k -Markov decision process ( k -MDP) is a generalization of that of an
MDP. The generalized framework enables to formulate control problems which
involve environments where transition probabilities, rewards, and policies depend
on the sequence of the k most recently visited states rather than only the current one.
We shall refer to this assumption as the generalized Markov property .
Assumption 10.1 (Generalized Markov property): In each state, the choice of
the optimal action depends exclusively on the k most recently visited states.
Hence, given state and action spaces S and A , the dynamics are characterized by
transition probabilities of the form
, s k , s 0
p s 1 , ... , s k , s 0 , s 1 ,
...
S
,
a
A
,
which translates into plain English as “The probability of a transition to state s 0
given action a , current state s k and previous states s 1 ,
...
, s k 1 .” Similarly, policies
and the reward function are of the form
r s 1 , ... , s k , s 0
and
π
ð
s 1 ; ...;
s k ;
a
Þ ,
respectively. It is worth stressing that by assigning k : ¼ 1, we recover the classical
MDP framework.
It comes as a bit of a surprise that in some theoretical sense, there is no
distinction between the classical and the k -MDP case. Given a k -MDP, it is always
possible to devise an equivalent MDP by means of a construction which we shall
refer to as state space augmentation . (The precise meaning of the term equivalent
will become clear in the course of the subsequent discussion.)
Given the state space S of some k -MDP M , we define the corresponding
augmented state space of order k by
k
1 s j
S
:¼ [
:
as follows: for
Upon this state space, we model an MDP M
:¼ S
;
A
; e
p
;e
r
s : ¼ ( s 1 ,
...
, s l ), let
Search WWH ::




Custom Search