Database Reference
In-Depth Information
Table 5.7 Frobenius error norms for simulations over virtual sessions with sometimes
recommendations left
2 recommendations
Linear
Nonlinear
c
k F
u
c
u
#sessions
k F
k F
k F
10
1.458
1.044
1.095
1.382
100
1.073
0.667
0.189
0.616
1,000
0.508
0.240
0.051
0.151
10,000
0.172
0.102
0.029
0.069
100,000
0.041
0.016
0.004
0.017
1,000,000
0.006
0.001
0.002
0.086
Table 5.8 Cumulated rewards for P- and DP-Versions for simulations over virtual sessions
p
p ss a
p ss a ¼
p ss a ¼ p ss a
p ss a ¼ p ss a =
2
γ
P
DP
P
DP
P
DP
0.0
17,258,161
18,439,767
8,671,018
8,671,018
5,909,776
7,591,168
0.5
25,511,885
25,348,836
8,671,018
8,671,018
4,485,351
8,937,920
1.0
23,418,997
25,710,600
8,671,018
8,671,018
4,853,128
8,957,431
Next now compare the P-Version (Sect. 5.1 ) with the complete DP-Version
(Sect. 5.2 ) for this example using the cumulated rewards over all sessions. We use
only one recommendation and Algorithm 5.1 in the DP-Version.
Table 5.8 shows the results for 100,000 sessions depending on the three cases
a)-c) and for different discount rates
.
Case (b) is clear: the conditional probabilities are identical to the unconditional
ones, and so recommendations are without effect. Provided the conditional proba-
bilities are larger than their unconditional counterparts, namely, in case (a), what
means that delivery of recommendations increases the probability of the transition
into the recommended state, both P-Version and DP-Version perform equally well.
Also the chain optimization turns out to be effective.
More complex is case (c). It may look a bit academic, because here on the
contrary the delivery of recommendations decreases their transition probability, but
is important for the understanding of the methods. Here the DP-Version performs
much better than the P-Version. As regards content this is clear because the
P-Version in some sense suffers from a kind of Russell's paradox: it recommends
products with highest p ss a r ss a , but by recommending them on the contrary, they were
accepted less frequently! The chain optimization makes the situation even worse,
because it calculates the expected rewards more accurately, and in doing so it
further worsens the recommendations! All this is certainly rooted in the fact that for
case c) Assumption 5.1 concerning the P-Version is not only simply violated but
turned into its complete opposite. At the same time we see that the DP-Version
handles the problem correctly.
γ
Search WWH ::




Custom Search