How Engines Learn to Generate Recommendations: Adaptive Learning Algorithms - Realtime Data Mining

Database Reference

In-Depth Information

Conversely, it may occur, though rather seldom in practice, that the response to

the recommendation a leads to a decrease in the associated transition probability

(for instance, by cannibalizing multiple product recommendations). We then have

p ss a <<

1: the unconditional action value would be

extremely strongly weighted - an apparently absurd effect. However, it should be

p ss a

and hence c ( s , a )

noted here that because of the relationship X

s 0 6¼s a

p ss 0 ¼ 1 p ss a , a large p ss a

leads to a

small unconditional action value, and we must perform a limit value consideration

here. We will explore this in more depth in the course of the special cases of ( 5.6 ).

For a quantitatively better understanding of ( 5.6 ), let us consider for the product

s the difference between the action values of two recommendations a and b:

ð X

s 0 6¼s a

ð X

s 0 6¼s b

q π s

ðÞq π s

ðÞ¼p ss a r ss a þ cs

p ss 0 r ss 0 p ss b r ss b cs

;

p ss 0 r ss 0

¼ p ss a r ss a p ss b r ss b þ cs

ð p ss b r ss b cs

;

ð p ss a r ss a

;

ðÞ X

s 0 6¼s a , s b

ðÞ X

s 0 6¼s a , s b

þ cs

;

p ss 0 r ss 0 cs

;

p ss 0 r ss 0

i r ss a p ss b cs

i r ss b þ cs

s 0 6¼s a , s b

¼ p ss a cs

ð p ss a

;

ð p ss b

;

ðÞcs

;

ðÞ

;

p ss 0 r ss 0 :

1 p ssa

By preliminary use of the estimate cs

ðÞ¼

;

1 p ssa 1 and similarly c ( s , b ) 1,

we obtain

q π s

ðÞq π s

ðÞp ss a p ss a

p ss b p ss b

p a r a Δ

p b r b : ð 5

;

r ss a

|{z}

r a

r ss b

|{z}

r b

¼Δ

7 Þ

| {z }

Δp b

p a

If, for the sake of simplicity, we initially set all rewards to 1, we have

q π s

ðÞq π s

p a

p b

;

ðÞΔ

;

Since we can generally assume that for a product s the probability of a product

transition to a product s y is higher if y is recommended, we have

p ss y >

p ss y ,

and we obtain the following interpretation. The recommendation a is then certainly

better than the recommendation b if the difference

p a between the transition prob-

abilities increased by the product recommendation is greater than the similar differ-

ence Δ

p b for the recommendation b . Instead therefore of making recommendations

a with the highest transition probability p a (: ¼ p ss a ) as in classical data mining, the

recommendations a that are made are those with the highest difference between the

Realtime Data Mining

Search WWH ::

Custom Search

Home