The Big Picture: Toward a Synthesis of RL and Adaptive Tensor Factorization - Realtime Data Mining

Database Reference

In-Depth Information

10.1 Markov-k-Processes and Augmented State Spaces

The notion of a k -Markov decision process ( k -MDP) is a generalization of that of an

MDP. The generalized framework enables to formulate control problems which

involve environments where transition probabilities, rewards, and policies depend

on the sequence of the k most recently visited states rather than only the current one.

We shall refer to this assumption as the generalized Markov property .

Assumption 10.1 (Generalized Markov property): In each state, the choice of

the optimal action depends exclusively on the k most recently visited states.

Hence, given state and action spaces S and A , the dynamics are characterized by

transition probabilities of the form

, s k , s 0

p s 1 , ... , s k , s 0 , s 1 ,

...

S

,

a

A

,

∈

which translates into plain English as “The probability of a transition to state s 0

given action a , current state s k and previous states s 1 ,

...

, s k 1 .” Similarly, policies

and the reward function are of the form

r s 1 , ... , s k , s 0

and

π

ð

s 1 ; ...;

s k ;

a

Þ ,

respectively. It is worth stressing that by assigning k : ¼ 1, we recover the classical

MDP framework.

It comes as a bit of a surprise that in some theoretical sense, there is no

distinction between the classical and the k -MDP case. Given a k -MDP, it is always

possible to devise an equivalent MDP by means of a construction which we shall

refer to as state space augmentation . (The precise meaning of the term equivalent

will become clear in the course of the subsequent discussion.)

Given the state space S of some k -MDP M , we define the corresponding

augmented state space of order k by

k

j¼ 1 s j

S

:¼ [

:

as follows: for

Upon this state space, we model an MDP M

:¼ S

;

A

; e

p

;e

r

s : ¼ ( s 1 ,

...

, s l ), let

Realtime Data Mining

Search WWH ::

Custom Search

Home