Database Reference
In-Depth Information
consisting of only one set, the update rule coincides with that for a classical MDP.
Hence, our factorization model incorporates the case where a k -MDP is approximated
by a 1-MDP model. This bears an epistemic value with regard to the assessment of
the quality of 1-MDP models in environments that actually satisfy a GMA with
k
>
1. Specifically, with regard to recommendation environments, which, arguably,
may be assumed to be more accurately represented by a k -MDP, this insight
may enable us to assess the quality of the classical MDP models discussed in
foregoing chapters. For example, we may obtain bounds on the modeling error
entailed by employing a classical MDP model from bounds on the approximation
error of the factorized representation. Admittedly, we are as yet in no position to
produce such error bounds here. Hence, we leave the topic for future research.
10.4 Factored Representation and Computation
of the State Values
10.4.1 A Model-Based Approach
In the following, we shall be interested in approximations of the form
v s s X
β m
u s β θ ¼ θ
ð 10
:
9 Þ
to the state-value function. Here, U denotes an aggregation prolongator as
introduced in Equation ( 10.5 ). In order to solve the Bellman equation ( 10.1 )
approximately, we devise the least squares approach
! 2
X
X
min
θ
θ sβ ðÞ γ
c ss 0 β ðÞ θ s 0 β s ðÞ
b s s
,
ð 10
:
10 Þ
,
s S
s 0
s S
S
which is obtained by inserting the factorized representations ( 10.9 ) and ( 10.4 ), with
U taken to be the aggregation prolongator defined in ( 10.5 ), for v and P in the least
squares version of ( 10.1 ),
! 2
X
X
v s s γ
p s ss 0 v s ð s 0
:
min
v
,
s S
s 0
s S
S
As regards practical computation in a recommendation framework, one may
proceed as follows: first, the core tensor C is estimated from observation by means
of the updating procedure ( 10.7 ). Eventually, Equation ( 10.10 ) may be solved by
means of numerical linear algebra.
Search WWH ::




Custom Search