Database Reference
In-Depth Information
p 11 = 0.4
p 12 = 0.6
=
2
y 1
y 2
=
3
3
s
s
1
2
p 21 = 0.3
p 22 = 0.7
Fig. 3.10 A graph Γ ( P ) (with two states) and its steady-state probabilities
( P ) denotes the spectral radius of P , that is, the largest absolute value of
its eigenvalues. Moreover, x and y are referred to as right and left Perron vector ,
respectively. As stated above, the spectral radius satisfies
Here,
σ
σ
( P ) ¼ 1 since P is
stochastic, and we may write the right Perron vector as
0
1
1
1
1
n
@
A :
x ¼
Let us now consider the left Perron vector. Since, by definition, it is positive and
satisfies 1 T y ¼ 1, we may consider it as a probability distribution on S . In virtue of
the Perron-Frobenius theorem, we obtain
0
@
1
A y 1 ...
1
1
k!1
P k
!
ð
y n
Þ
for primitive P . This distribution is referred to as the steady-state distribution
(or stationary distribution ), to which the user behavior converges. This property
is a prerequisite for the convergence of the TD(
λ
) algorithm as well as other
procedures.
Example 3.8 To illustrate the abstract discussion, we consider an outright simple
example, which is depicted by Fig. 3.10 .
We thus have two states and the following transition matrix P :
0
:
40
:
6
P ¼
:
0
:
30
:
7
Then the left Perron vector is given by
¼ y 1 þ y 2 ¼ 1,
0
:
40
:
6
y 1
y 2
ð
y 1 y 2
Þ
¼ y 1 y 2
ð
Þ , 11
ð
Þ
0
:
30
:
7
 
Search WWH ::




Custom Search