Database Reference
In-Depth Information
Let us now return to the special case where the transfer tensor U is taken to be an
aggregation prolongator. Intuitively, the following proposition, which, once stated,
is obvious, tells us that refining the underlying partition results in refining the
corresponding function space. Specifically, we say that a partition G 0 is a refinement
of a partition G if each element of G 0 is a subset of some element of G. In other
words, G 0 is obtained from subdividing elements of G into smaller sets.
Proposition 10.1 Let G,G 0 be partitions with corresponding aggregation
prolongators U,U 0 .If G 0 is a refinement of G , then the range of U is contained in
that of U 0 .
Proof All we have to show is that each column of U may be written as some
linear combination of columns of U 0 . To this end, consider
β 0 1 ,
β
G and let
...
,
β l
G 0 satisfy
0
1 [ ...[β
0
l
β ¼ β
:
β 0 1 ,
β l are disjoint yields
This together with the fact that
...
,
U β ¼ U 0
1 þ ...þ U 0
l :
β
β
In particular, this result ensures that refining a partition cannot cause any
deterioration of the approximation. Yet another straightforward calculation yields
the following crucial and, at the same time, rather astonishing insight.
Proposition 10.2 Approximate TD(
) for k-MDPs with the approximation archi-
tecture induced by the Tucker model with transfer tensor taken to be the aggrega-
tion prolongator corresponding to the partition with only one element is equivalent
to classical TD(
λ
λ
) for 1-MDPs applied to a k-MDP.
Proof The update rule of the algorithm for 1-MDPs as applied to a k-MDP in the
above notation is given by
zd v
z
:¼ λγ
z þ e s , v
v þ α
wherein v
:
S ! R
denotes the current iterate for the approximate 1-MDP state-
value function, and the temporal difference
d v :¼ r v s γ
,
v s 0
with r signifying the most recently incurred reward, whereas, in the basis function
view, the considered approximate algorithm for a k-MDP may be stated as
T zd
z
:¼ γλ
z þ e s;ð , θ :¼ θ þ αΦ
Φð ,
in which
θ
,
Φ
are as specified above, and the temporal difference
Search WWH ::




Custom Search