How Engines Learn to Generate Recommendations: Adaptive Learning Algorithms - Realtime Data Mining

Database Reference

In-Depth Information

where k is the number of all the other states s 0 6¼ s a . This is the most complicated

case. The above equation follows from the elementary limit value consideration

with the equation p ss 0 ¼

1 p ss a

8s 0

6¼ s a :

k X

s 0 6¼s a

1 p ss a

r ss 0 ¼ 1 p ss a

lim

p ss a ! 1

r ss 0 ¼ lim

p ss a ! 1

r ss 0 :

s 0 6¼s a

Of course, the distribution of the unconditional probabilities p ss 0 may also be

modeled differently (which may lead to a different result), but this is the most

natural approach. The interpretation is since in the recommendation-free case, the

transition always leads to s a , the distribution of the p ss 0 is unknown if, in the

recommendation case, the transition leads to s 0 6¼ s a . Therefore, the p ss 0 are

assumed to be equal. Hence, the action value corresponds to the conditional action

value of recommendation a plus the probability of nonacceptance of the recom-

mendation times the average reward over the other states s 0 6¼ s a .

The approach described for estimation of the transition probabilities in (3.5)

can now be applied similarly for a positive discount rate γ ( 3.6 ) , so that in the

resultweagainobtainanequationsimilarto( 5.6 ), albeit of course more

complex:

q π s

ðÞ¼p ss a

p ss a þ γ

a 0

Þq π s a ;

a 0

;

s a ;

a 0

ð X

s 0 6¼s a

p ss 0

r ss 0 þ γ

s 0

a 0

Þq π s 0

a 0

þ cs

;

ð 5

9 Þ

a 0

This is then solved in realtime once again using ADP. Concerning a corresponding

TD algorithm, it is not easy to derive because of the nonlinearity of ( 5.8 )withrespect

to p ss a . We leave this as an open problem. Nevertheless, we will also consider the

TD-Version in the course of this topic, especially in Chaps. 6 and 10 .

5.2.3 Estimation of Transition Probabilities

The conditional transition probabilities p ss a may be computed from the transactions

according to Algorithm 4.1 or, in case that multiple recommendations have been

issued, Algorithm 4.2.

Computation of the unconditional transition probabilities p ss 0 has not yet been

addressed. It may be calculated either from sessions in the control group (the topic

Realtime Data Mining

Search WWH ::

Custom Search

Home