Database Reference
In-Depth Information
b ¼ r π
ð 6
:
10 Þ
so that Eq. ( 3.15 ) takes the standard form ( 6.4 ):
Av ¼ b
:
Now Ziv investigated the use of classic iteration methods such as the Richardson
method for the solution of ( 6.4 ) with the operator ( 6.9 ) and the right-hand side
( 6.10 ). Using the eigenvalue properties of the transition probability matrix P π ,it
was established by this means that algebraic smooth error vectors do arise. As was
mentioned at the start, multilevel methods are outstandingly suitable for the solu-
tion of these problems. Of course everything we have described carries over to the
case of action-value functions ( 3.17 ).
Furthermore. in [Ziv04] and [Pap10], the use of multilevel methods is investi-
gated not only for the problems of dynamic programming (i.e., model based) but
also for model-free methods, in particular for temporal-difference learning. For the
latter, two multilevel methods are proposed, the first a multiplicative variant and the
second an additive variant. The multiplicative variant is less convincing, and it
cannot be fully proven to converge. In the following we will investigate the
multigrid method for the model-based case and then move to Ziv's additive
preconditioner for the model-free case.
6.2.1
Interpolation and Restriction Based on State
Aggregation
We consider a hierarchy with the levels 0, ... , L , where 0 is the finest grid which
represents the current state values from S and L the coarsest grid. An interpolator
I lþ1 from level 1 to level l shall be given. For the restriction I l of level 1to
the level l in general, the transpose or pseudo-inverse of I lþ1 is used.
In [Ziv04] and [Pap10] the construction of the interpolator I lþ1 was considered
on an algebraic basis in the course of the algebraic multigrid. In particular the state
aggregation was considered according to [BC89] that we will also apply. For this
purpose the state space S is split into K disjoint groups G β ,
β ¼ 1,
...
, m . The
definition of the interpolator is
I 1 ¼
1,
G β
0, else
i
ð 6 : 11 Þ
and the restrictor is defined as its pseudo-inverse according to Moore-Penrose:
h
i 1
I 1 T I 1
I 1 T
I 1
l
¼
:
ð 6
:
12 Þ
Search WWH ::




Custom Search