Changing Not Just Analyzing: Control Theory and Reinforcement Learning - Realtime Data Mining

Database Reference

In-Depth Information

TD(

) algorithm. In case that the system be reducible, it may be decomposed into

smaller irreducible subsystems that may then be considered separately.

Since P is a row stochastic matrix, that is, a matrix of transition probabilities, its

rows sum up to 1; see ( 3.2 ). In other words, the vector of all ones is an eigenvector

of P corresponding to the eigenvalue 1. Since P is nonnegative, its largest row sum

coincides with its row sum norm. Therefore, 1 is also the spectral radius of P , that is,

the largest absolute value of its eigenvalues.

A fundamental result of the theory of nonnegative matrices states that any

irreducible nonnegative matrix has a positive spectral radius which is itself an

eigenvalue. Furthermore, this eigenvalue is algebraically simple. This eigenvalue

and the corresponding left (right) eigenvector are referred to as Perron eigenvalue

and left (right) Perron vector , respectively.

The matrix P is said to be primitive if it is irreducible and the absolute value

of all of its eigenvalues except the Perron eigenvalue is strictly smaller than

the spectral radius. This abstract definition may be captured by the following simple

criterion: the matrix P is primitive if and only if P k is (strictly) positive for some

positive integer k . The following sufficient condition holds: the matrix P is prim-

itive if it is irreducible and p ii >

λ

n . Hence, for example, our matrix

from Fig. 3.9 is primitive, since p 55 ¼ 0, 5 > 0. In terms of the graph Γ ( P ), this

corresponds to the criterion of the existence of a cycle of length 1, that is, a node is

connected to itself.

Thus, we essentially conclude our brief introduction to fundamental algebraic

properties of the transition probability matrix P . At the same time, we saw that each

of these has an intuitive graph theoretical counterpart with respect to the graph

induced by P . We will make use of these properties below.

0 for some i

∈

3.9.3 The Steady-State Distribution

As noted above, the property ( 3.2 ) is called row stochasticity. If P is primitive, the

Perron-Frobenius theorem implies that

P k

ρ

xy T

y T x ,

n

lim

k!1

ðÞ ¼

x , y

∈R

,

where

0

@

1

A ¼ 1

x 1

x n

y T P ¼ σ

ðÞy T ,

Px ¼ σ

ðÞx ,

x

>

ð

...

Þ

;

0,

1

0

@

1

A ¼ 1

y 1

y n

y

>

0,

ð

1

...

1

Þ

:

Realtime Data Mining

Search WWH ::

Custom Search

Home