Changing Not Just Analyzing: Control Theory and Reinforcement Learning - Realtime Data Mining

Database Reference

In-Depth Information

p 11 = 0.4

p 12 = 0.6

=

2

y 1

y 2

=

3

s

1

2

p 21 = 0.3

p 22 = 0.7

Fig. 3.10 A graph Γ ( P ) (with two states) and its steady-state probabilities

( P ) denotes the spectral radius of P , that is, the largest absolute value of

its eigenvalues. Moreover, x and y are referred to as right and left Perron vector ,

respectively. As stated above, the spectral radius satisfies

Here,

σ

( P ) ¼ 1 since P is

stochastic, and we may write the right Perron vector as

0

1

n

@

A :

x ¼

Let us now consider the left Perron vector. Since, by definition, it is positive and

satisfies 1 T y ¼ 1, we may consider it as a probability distribution on S . In virtue of

the Perron-Frobenius theorem, we obtain

0

@

1

A y 1 ...

1

k!1

P k

!

ð

y n

Þ

for primitive P . This distribution is referred to as the steady-state distribution

(or stationary distribution ), to which the user behavior converges. This property

is a prerequisite for the convergence of the TD(

λ

) algorithm as well as other

procedures.

Example 3.8 To illustrate the abstract discussion, we consider an outright simple

example, which is depicted by Fig. 3.10 .

We thus have two states and the following transition matrix P :

0

:

40

:

6

P ¼

:

0

:

30

:

7

Then the left Perron vector is given by

¼ y 1 þ y 2 ¼ 1,

0

:

40

:

6

y 1

y 2

ð

y 1 y 2

Þ

¼ y 1 y 2

ð

Þ , 11

ð

Þ

0

:

30

:

7

Realtime Data Mining

Search WWH ::

Custom Search

Home