Changing Not Just Analyzing: Control Theory and Reinforcement Learning - Realtime Data Mining

Database Reference

In-Depth Information

and we obtain

0

@

1

A :

1

3

2

3

y ¼

In words, if a user in state 1 dwells on it with 40 % probability and switches to

state 2 with 60 % probability and, conversely, remains in state 2 with 70 % proba-

bility and moves to state 1 with 30 % probability, he/she will - in a session of infinite

length - be found in state 1 in 33 % and in state 2 in 66 % of the trajectory.

■

3.9.4 On the Convergence and Implementation

of RL Methods

In this section, we will investigate the convergence of the two most important

methods that we have introduced so far: the policy iteration from Sect. 3.6 and the

temporal-difference learning from Sect. 3.8 . So the question is: do these methods

converge against the optimal policy and under which assumptions? The question of

the convergence speed will not be addressed here.

The question of convergence is far more than of theoretical interest only.

Because in the end for the solution of our practical problems, that is, computation

of recommendations, we want to be sure that a solution even exists.

We start with the policy iteration. In Sect. 3.5 , we mentioned that by selecting a

policy

for the Markov decision process, we obtain a Markov chain. Indeed, by

determining the decision in form of the policy, there remains nothing more to

decide in the decision process but only to model the Markov chain.

The Bellman equation ( 3.7 ) can be written compactly in vector notation as

π

v π ¼ r π þ γ

P π v π ,

ð 3

:

15 Þ

where the vectors of the state values v π and rewards r π are defined as follows:

T ,

v π ¼ v π sðÞv π sðÞ...

v π sðÞ

½

T ,

r π ¼ r π

sðÞr π

r π sðÞ

½

sðÞ...

ð 3

:

16 Þ

ðÞ¼ X

ð X

s 0

r π

p ss 0 r ss 0

a π

s

;

and P π is the matrix of transition probabilities p ss 0 ( 3.3 ).

Similarly, the Bellman equation for the action-value function ( 3.6 ) can be

reformulated as

Realtime Data Mining

Search WWH ::

Custom Search

Home