The Big Picture: Toward a Synthesis of RL and Adaptive Tensor Factorization - Realtime Data Mining

Database Reference

In-Depth Information

defined in the same way as in Sect. 4.2 (except that here m denotes the number of

recommendations instead of k which now is the number of preceding states). Then

we arrive at our complete probability space:

:¼ P a 1 , ...a m

ð

Þ

s 0

∈R s ssA

P s ss 0

P

,

Þss 0 ,

l

<

k

,

s 1 , ...

s l ,

s

,

a 1 , ...

a m ∈

S

ð 10 : 13 Þ

ð

s 1 ;...;s l

The problem is the high dimensionality of P . For l ¼ k 1 the dimension is

k + m + 2. The same applies to the reward space R

∈R s ssA ; even with Assump-

∈R s ss . The action-value function also belongs to

R s ss and

tion 4.2 we get R

the state-value function to R s s .

Let us focus on the most complex quantity, the transition probability ( 10.13 ).

The best way would be an approximation through a tensor of dimension k + m +2.

The general tensor approach is described in Chap. 9 . Unfortunately, this is an

extremely difficult task because of the complexity of the decomposition algorithms

and also the prediction quality of the model. Thus, we may look for a more specific

approach. Therefore, we can use separate models for the approximation in the state

and action dimensions. Thus, we seek an approximation in the state space and then

add the approximation in the action space.

For the state space we can use tensor approximations as in Chap. 9 or the specific

one presented in Sect. 10.2 . If this is still too difficult, we ignore the previous states,

i.e., we consider k ¼ 1. In this case P is a matrix. So we can either apply the matrix

factorization of Chap. 8 to P or calculate it directly.

To bring in the actions, we proceed as in Sect. 5.2 using the empirical Assump-

tion 5.2. In case of multiple recommendations, we additionally need the framework

of Sect. 4.2 which is based on Assumption 4.3. The combination of both for

calculating transition probabilities has been demonstrated in Sect. 5.2.3 . Similar

considerations can be undertaken for the other quantities like transition rewards,

action-value function, and state-value function.

Finally, the hierarchical methods presented in Chap. 6 allow to increase the

convergence speed.

This way we have developed a complete tool set to handle reinforcement

learning for recommendations. The different approaches can be combined in

numerous ways. Of course, many of them still need to be refined, and also the

question of their best combinations remains to be open. The answer, again, depends

on the properties of the different approaches.

10.7 Summary

In this chapter, we have proposed a particular way to combine the factorization-based

approach to recommendation with the control-theoretic one. We stress that this is

only one specific manner in which the two paradigms may interact with each other,

and there are certainly numerous fundamentally different possible connections.

Realtime Data Mining

Search WWH ::

Custom Search

Home