How Engines Learn to Generate Recommendations: Adaptive Learning Algorithms - Realtime Data Mining

Database Reference

In-Depth Information

Table 5.7 Frobenius error norms for simulations over virtual sessions with sometimes

recommendations left

2 recommendations

Linear

Nonlinear

k F

#sessions

kΔ

k F

kΔ

k F

kΔ

k F

1.458

1.044

1.095

1.382

100

1.073

0.667

0.189

0.616

1,000

0.508

0.240

0.051

0.151

10,000

0.172

0.102

0.029

0.069

100,000

0.041

0.016

0.004

0.017

1,000,000

0.006

0.001

0.002

0.086

Table 5.8 Cumulated rewards for P- and DP-Versions for simulations over virtual sessions

p ss a

p ss a ¼

p ss a ¼ p ss a

p ss a ¼ p ss a =

0.0

17,258,161

18,439,767

8,671,018

5,909,776

7,591,168

0.5

25,511,885

25,348,836

8,671,018

4,485,351

8,937,920

1.0

23,418,997

25,710,600

8,671,018

4,853,128

8,957,431

Next now compare the P-Version (Sect. 5.1 ) with the complete DP-Version

(Sect. 5.2 ) for this example using the cumulated rewards over all sessions. We use

only one recommendation and Algorithm 5.1 in the DP-Version.

Table 5.8 shows the results for 100,000 sessions depending on the three cases

a)-c) and for different discount rates

Case (b) is clear: the conditional probabilities are identical to the unconditional

ones, and so recommendations are without effect. Provided the conditional proba-

bilities are larger than their unconditional counterparts, namely, in case (a), what

means that delivery of recommendations increases the probability of the transition

into the recommended state, both P-Version and DP-Version perform equally well.

Also the chain optimization turns out to be effective.

More complex is case (c). It may look a bit academic, because here on the

contrary the delivery of recommendations decreases their transition probability, but

is important for the understanding of the methods. Here the DP-Version performs

much better than the P-Version. As regards content this is clear because the

P-Version in some sense suffers from a kind of Russell's paradox: it recommends

products with highest p ss a r ss a , but by recommending them on the contrary, they were

accepted less frequently! The chain optimization makes the situation even worse,

because it calculates the expected rewards more accurately, and in doing so it

further worsens the recommendations! All this is certainly rooted in the fact that for

case c) Assumption 5.1 concerning the P-Version is not only simply violated but

turned into its complete opposite. At the same time we see that the DP-Version

handles the problem correctly.

Realtime Data Mining

Search WWH ::

Custom Search

Home