How Engines Learn to Generate Recommendations: Adaptive Learning Algorithms - Realtime Data Mining

Database Reference

In-Depth Information

We designate the corresponding approach using unconditional transition probabil-

ities as the unconditional or probabilistic approach in contrast to the conditional

approach using conditional probabilities. Below, we will derive expressions for

both approaches and then combine them in a useful form.

5.1 Unconditional Approach

The probabilistic approach thus corresponds to the classical view described by

Approach 1 in Chap. 2 . For this, we apply RL approaches formally, so as, for

instance, to factor in rewards and chain optimization, i.e., we use the formal

advantages of RL in order to broaden the classical approach. Thus, instead of

recommending a product s 0 , which after viewing s is bought most frequently, we

incorporate its reward r . That seems logical. In addition, we give preference to those

products which, including subsequent purchases by existing customers, lead to the

highest sales. So, chain optimization seems reasonable as well.

In the strict RL sense, that is nonsensical. We only learn the policy which users

pursue on their own initiative without regard to the recommendations, and we

reinforce this. But as we have seen, the generalized general policy iteration at

least offers a general justification of this approach. And the results are good in

practice.

The approach thus tends to implement the recommendations with the highest

unconditional transition probabilities and action values. Let us start with the

“simple” case ( 3.5 ). The direct use of ( 3.5 ) would take us no further forward

here, since

ðÞ¼ X

s 0

p ss 0 r ss 0 ¼ X

s 0

q π s

p ss 0 r ss 0 ¼ q 0

ðÞ¼q 0

;

s

;

ðÞ

would yield the same action value for all recommendations a . In fact, if the

transition probabilities were independent of issuing recommendations, one

would not need any recommendations at all. Therefore, we make the following

assumption:

Assumption 5.1 (Unconditional probability property): For each state

transition from s to the state s a associated with the action a , the transition proba-

bility p ss a is considered as proportional to the unconditional probability p ss a ,i.e.,

p ss a ¼ dp ss a 0

1. For any other state transition from s to s 0 under

the action a , the transition probability p ss 0 is likewise considered as proportional to

the unconditional probability p ss 0 ,i.e., p ss 0 ¼ cp ss 0 , but with the factor c

with the factor d

>

<

1.

In other words, delivering recommendation a increases the probability of the

transition for the product s a associated with it. The higher the unconditional

Realtime Data Mining

Search WWH ::

Custom Search

Home