Database Reference
In-Depth Information
p
11
= 0.4
p
12
= 0.6
=
2
y
1
y
2
=
3
3
s
s
1
2
p
21
= 0.3
p
22
= 0.7
Fig. 3.10 A graph
Γ
(
P
) (with two states) and its steady-state probabilities
(
P
) denotes the spectral radius of
P
, that is, the largest absolute value of
its eigenvalues. Moreover,
x
and
y
are referred to as
right
and
left Perron vector
,
respectively. As stated above, the spectral radius satisfies
Here,
σ
σ
(
P
)
¼
1 since
P
is
stochastic, and we may write the right Perron vector as
0
1
1
1
1
n
@
A
:
x ¼
Let us now consider the left Perron vector. Since, by definition, it is positive and
satisfies 1
T
y ¼
1, we may consider it as a probability distribution on
S
. In virtue of
the Perron-Frobenius theorem, we obtain
0
@
1
A
y
1
...
1
1
k!1
P
k
!
ð
y
n
Þ
for primitive
P
. This distribution is referred to as the
steady-state distribution
(or
stationary distribution
), to which the user behavior converges. This property
is a prerequisite for the convergence of the TD(
λ
) algorithm as well as other
procedures.
Example 3.8
To illustrate the abstract discussion, we consider an outright simple
example, which is depicted by Fig.
3.10
.
We thus have two states and the following transition matrix
P
:
0
:
40
:
6
P ¼
:
0
:
30
:
7
Then the left Perron vector is given by
¼ y
1
þ y
2
¼
1,
0
:
40
:
6
y
1
y
2
ð
y
1
y
2
Þ
¼ y
1
y
2
ð
Þ
, 11
ð
Þ
0
:
30
:
7