Database Reference
In-Depth Information
Theorem 6.2 Let the prerequisites for convergence of TD(
) of Theorem 3.2 be
satisfied. Moreover, let B 1 be a symmetric and positive definite (spd) NN-matrix.
Then the preconditioned TD(
λ
λ
)-method
:¼ w þ α t B 1 z t d t
w
ð 6
:
19 Þ
converges as well.
Since C 1
t
is an spd N N -matrix, the hierarchically preconditioned TD(
λ
)
converges.
The preconditioned TD(
) algorithm ( 6.18 ), however, operates in terms of action
values. Hence, the inter-level operators are needed for state-action pairs ( s, a ) rather
than for single states. Yet how can we define hierarchies of actions? Since in the
recommendation approach the spaces S and A are isomorphic ( 4.1 ), actions may be
treated in the same way as states, and the same inter-level operators may be used
for the former.
While the states in S correspond to products that are endowed with recommen-
dations, the actions A correspond to the recommended products. Thus, similarly to
( 6.11 ), the following definition of the prolongator I 1 suggests itself:
λ
I 1 ijβγ ¼
1,
i
G β
j
H γ
ðÞ
:
ð 6
:
20 Þ
0,
else
:
, m i refer to A ( s i ), that is, all actions
executable in state s i . Thus, the prolongation matrix is a block-diagonal matrix,
where the blocks correspond to states and the block values to the actions.
Subsequently, we shall address a modification of the prolongator
Here, the aggregations H γ ( i ),
γ ¼ 1,
...
I 1 .This
weighted prolongator is defined as follows:
,
j Aa j
l
1
j
AsðÞ
i
G β
j
H γ ðÞ
I
¼
:
ð 6
:
21 Þ
0,
else
ijβγ
Here, | A ( s i )| denotes the number of all actions in state s i , that is, all rules for the
corresponding product, and | A ( a j )| the number of actions in the state associated with a j ,
that is, the rules with the associated product for a conclusion. This weighted prolongator
thus prefers rules with “strong” prerequisite or subsequent products, respectively.
In general, one can derive multiple hierarchies from the product specifications,
for example, by means of shop hierarchies, commodity groups, and product attri-
butes. Consequently, a corresponding preconditioner C i can be derived for each
hierarchy according to ( 6.17 ). This gives rise to the question of whether
preconditioners can also be applied in a combined fashion. Indeed, this is possible,
for example, with respect to the preconditioner C 1
a
:
C 1
a
¼ C 1
1
þ C 1
2
þ ...þ C 1
n
,
ð 6
:
22 Þ
where n denotes the number of all used hierarchies . Since all of the preconditioners
C 1
i
C a , and convergence of preconditioned TD(
are spd, so is
λ
) follows from
Theorem 6.2.
Search WWH ::




Custom Search