Database Reference
In-Depth Information
s 3
s 4
0
0.5
0
0
0.5
1
0
0
0
0
P
=
0.8
0.2
0
0
0
s 2
0
0
1
0
0
0
0
0.2
0.3
0.5
s 5
s 1
Fig. 3.9 Example of a matrix P (with five states) and the thereby induced graph Γ ( P ). The latter
contains, e.g., the cycles ( s 1 , s 2 , s 1 ), ( s 1 , s 5 , s 4 , s 3 , s 1 ), as well as ( s 5 , s 5 )
3.9.2 Properties of Graphs and Matrices
In the following, we shall, on one hand, consider the matrix of transition probabil-
ities P ¼ ( p ss 0 ) s , s 0 S , on the other hand, the thereby induced graph
( P ). (Here, we
shall ignore the actions a ; P ¼ P π will denote the transition probability matrix of
the Markov chain M π .) For a detailed version of this discussion, we refer the reader
to the fundamental work by Paprotny [Pap10]. Furthermore, the general
link between matrix analysis and numerical linear algebra on one hand and dynamic
programming and stochastic iterative methods on the other hand has been
studied therein.
We first define the graph
Γ
( P ) of the matrix P . We recall that a directed graph is
described by its sets of nodes and edges. In our case, the set of nodes is precisely
the state space, and the set of edges is the set of possible state transitions.
Thus, formally, the directed graph of P is defined as
Γ
E ,
Γ
ðÞ:¼ n
;
ð
i
;
E , p ij
0
:
A tuple ( s 1 ,
...
, s l ) of nodes s i is said to be a path of length l -1 from s 1 to s l ,if
ð
s i ;
s 1
Þ
E 8i
l 1
:
A path is called a cycle ,if s 1 ¼ s l . A node is reachable from v
S , if there is a
path from v to w in
. These notions are illustrated by Fig. 3.9 .
Intuitively, a matrix is said to be reducible if it can be transformed into an upper
block triangular matrix by some permutation. A matrix is called irreducible if it is
not reducible. The graph
Γ
Γ
is said to be strongly connected, if for each pair of nodes
v , w
S , there is a path from v to w in
Γ
. The following holds: P is irreducible
if and only if
Γ
( P ) is strongly connected. So, P in the example from Fig. 3.9 is
irreducible.
Irreducibility of P is a prerequisite for several important properties in RL and, in
particular, ensures convergence of some important procedures, as, for example, the
Search WWH ::




Custom Search