Database Reference
In-Depth Information
Conversely, it may occur, though rather seldom in practice, that the response to
the recommendation
a
leads to a decrease in the associated transition probability
(for instance, by cannibalizing multiple product recommendations). We then have
p
ss
a
<<
1: the unconditional action value would be
extremely strongly weighted - an apparently absurd effect. However, it should be
p
ss
a
and hence
c
(
s
,
a
)
>>
noted here that because of the relationship
X
s
0
6¼s
a
p
ss
0
¼
1
p
ss
a
, a large
p
ss
a
leads to a
small unconditional action value, and we must perform a limit value consideration
here. We will explore this in more depth in the course of the special cases of (
5.6
).
For a quantitatively better understanding of (
5.6
), let us consider for the product
s
the difference between the action values of two recommendations
a
and
b:
ð
X
s
0
6¼s
a
ð
X
s
0
6¼s
b
q
π
s
ðÞq
π
s
ðÞ¼p
ss
a
r
ss
a
þ cs
p
ss
0
r
ss
0
p
ss
b
r
ss
b
cs
;
;
;
;
p
ss
0
r
ss
0
¼ p
ss
a
r
ss
a
p
ss
b
r
ss
b
þ cs
ð p
ss
b
r
ss
b
cs
;
ð p
ss
a
r
ss
a
;
ðÞ
X
s
0
6¼s
a
,
s
b
ðÞ
X
s
0
6¼s
a
,
s
b
þ cs
;
p
ss
0
r
ss
0
cs
;
p
ss
0
r
ss
0
h
i
r
ss
a
p
ss
b
cs
h
i
r
ss
b
þ cs
X
s
0
6¼s
a
,
s
b
¼ p
ss
a
cs
ð p
ss
a
;
ð p
ss
b
;
½
ðÞcs
;
ðÞ
;
p
ss
0
r
ss
0
:
1
p
ssa
By preliminary use of the estimate
cs
ðÞ¼
;
1
p
ssa
1 and similarly
c
(
s
,
b
)
1,
we obtain
h
i
h
i
q
π
s
ðÞq
π
s
ðÞp
ss
a
p
ss
a
p
ss
b
p
ss
b
p
a
r
a
Δ
p
b
r
b
: ð
5
;
;
r
ss
a
|{z}
r
a
r
ss
b
|{z}
r
b
¼Δ
:
7
Þ
|
{z
}
Δ
|
{z
}
Δp
b
p
a
If, for the sake of simplicity, we initially set all rewards to 1, we have
q
π
s
ðÞq
π
s
p
a
p
b
;
ðÞΔ
;
Δ
:
Since we can generally assume that for a product
s
the probability of a product
transition to a product
s
y
is higher if
y
is recommended, we have
p
ss
y
>
p
ss
y
,
and we obtain the following interpretation. The recommendation
a
is then certainly
better than the recommendation
b
if the difference
p
a
between the transition prob-
abilities increased by the product recommendation is greater than the similar differ-
ence
Δ
Δ
p
b
for the recommendation
b
. Instead therefore of making recommendations
a
with the highest transition probability
p
a
(:
¼ p
ss
a
) as in classical data mining, the
recommendations
a
that are made are those with the highest difference between the