Recommendations as a Game: Reinforcement Learning for Recommendation Engines - Realtime Data Mining

Database Reference

In-Depth Information

This stability result means that the less our model assumptions are violated, the

better our observed probabilities can be estimated. Similar reasoning applies to the

methods for the estimation of transition probabilities presented in Sect. 5.2.3 .

4.3 Remarks on the Modeling

In what follows, we would like to study the remarks from Sect. 3.9 with regard to

the above devised model of recommendation engines.

As we are dealing with recommendation engines with episodic tasks, the question

for the terminal state arises. Indeed, we have already seen the latter in Figs. 4.2 and

4.3 . It is, indeed, meaningful for two reasons: first, it has a clear interpretation with

regard to content, as it assigns to each product the probability that a user terminates

the session afterward, i.e., leaves the shop. Second, it is relevant for the computation

of transition probabilities. According to (3.2), these must sum up to one, as P is

stochastic. Indeed, one could ignore the end of the session when computing P , but

this would result in adulterated transition probabilities. Since the latter are multi-

plied with the rewards in the Bellman equation ( 3.4 ), their actual magnitude matters.

As opposed to all the other states, which represent products, there is, of course,

no action affiliated to the terminal state - since this would mean to suggest the user

to leave the shop.

Another issue is the question of whether it is meaningful to consider recommen-

dations from products to themselves, i.e., p ss . This corresponds to the representation

of rules of the form s ! s. We remind the reader that this is a sufficient condition for

primitivity of the matrix P (together with irreducibility, which we shall address later

on). At the first glance, these rules do not convey much information; they only

signify that the user repeatedly calls the product up, i.e., hits the refresh button.

(This is different when we operate on the level of categories as in Chap. 6 .) On the

other hand, they are, for the same reasons as the terminal state, relevant with regard

to the computation of transition probabilities. Hence, the internal usage of these

transitions is recommendable. They must, however, not serve as recommendations,

as they would give rise to products recommending themselves.

Finally, let us turn to the question of irreducibility. In most practical applica-

tions, it does not hold. In Chaps. 6 , 7 , 8 , and 9 on hierarchical methods and

factorizations, we shall, however, deal with procedures that enable to compute an

almost unlimited amount of recommendations for each product, i.e., transitions

satisfying p ss 0 >

0. This may easily be exploited to render P irreducible. At the

same time, irreducibility may also have positive effects since it decomposes the

global problem into uncoupled subproblems. An example is Theorem 6.1 about

the convergence of the multigrid method. Thus, the value of irreducibility has to

be checked depending on the used method.

Let us summarize: it is reasonable to include the terminal state in the model of

the RE. So is it, in general, to capture cycles of length 1. Invoking special tools, it is

possible to ensure that P be irreducible. Thus, the essential conditions for conver-

gence of the TD algorithm are satisfied.

Realtime Data Mining

Search WWH ::

Custom Search

Home