Database Reference
In-Depth Information
defined in the same way as in Sect. 4.2 (except that here m denotes the number of
recommendations instead of k which now is the number of preceding states). Then
we arrive at our complete probability space:
:¼ P a 1 , ...a m
ð
Þ
s 0
∈R s ssA
P s ss 0
P
,
Þss 0 ,
l
<
k
,
s 1 , ...
s l ,
s
,
,
a 1 , ...
a m
S
ð 10 : 13 Þ
ð
s 1 ;...;s l
The problem is the high dimensionality of P . For l ¼ k 1 the dimension is
k + m + 2. The same applies to the reward space R
∈R s ssA ; even with Assump-
∈R s ss . The action-value function also belongs to
R s ss and
tion 4.2 we get R
the state-value function to R s s .
Let us focus on the most complex quantity, the transition probability ( 10.13 ).
The best way would be an approximation through a tensor of dimension k + m +2.
The general tensor approach is described in Chap. 9 . Unfortunately, this is an
extremely difficult task because of the complexity of the decomposition algorithms
and also the prediction quality of the model. Thus, we may look for a more specific
approach. Therefore, we can use separate models for the approximation in the state
and action dimensions. Thus, we seek an approximation in the state space and then
add the approximation in the action space.
For the state space we can use tensor approximations as in Chap. 9 or the specific
one presented in Sect. 10.2 . If this is still too difficult, we ignore the previous states,
i.e., we consider k ¼ 1. In this case P is a matrix. So we can either apply the matrix
factorization of Chap. 8 to P or calculate it directly.
To bring in the actions, we proceed as in Sect. 5.2 using the empirical Assump-
tion 5.2. In case of multiple recommendations, we additionally need the framework
of Sect. 4.2 which is based on Assumption 4.3. The combination of both for
calculating transition probabilities has been demonstrated in Sect. 5.2.3 . Similar
considerations can be undertaken for the other quantities like transition rewards,
action-value function, and state-value function.
Finally, the hierarchical methods presented in Chap. 6 allow to increase the
convergence speed.
This way we have developed a complete tool set to handle reinforcement
learning for recommendations. The different approaches can be combined in
numerous ways. Of course, many of them still need to be refined, and also the
question of their best combinations remains to be open. The answer, again, depends
on the properties of the different approaches.
10.7 Summary
In this chapter, we have proposed a particular way to combine the factorization-based
approach to recommendation with the control-theoretic one. We stress that this is
only one specific manner in which the two paradigms may interact with each other,
and there are certainly numerous fundamentally different possible connections.
Search WWH ::




Custom Search