Database Reference
In-Depth Information
p
a
. That is, those recommenda-
tions which most increase the transition probability. This is also logical, since the
unconditional transition probabilities
p
a
are in fact achieved even without recommen-
dations. By the way, it can be generally assumed that in most cases the relationship
conditional and unconditional transition probability
Δ
p
a
p
a
Δ
has an approximate validity, which is why classical data mining methods
work to some extent (see also previous section:
p
a
p
ss
a
p
ss
a
Δ
¼
¼ dp
ss
a
p
ss
a
¼ d ð Þp
ss
a
p
a
).
If we now discard the restriction regarding the 1-rewards, then, according to (
5.7
),
recommendation
a
will then certainly be better than recommendation
b
if the
difference in action value
p
a
r
a
for the transition probability increased by the product
recommendation is higher than the difference in action value
Δ
p
b
r
b
due to recom-
mendation
b.
This is an extension of the preceding case and its content is clear.
It should, however, be remembered that even the simplification
c
(
s
,
a
)
1 is not
always valid in practice and, moreover, is not always useful. In order to understand
this, consider just our two products
a
and
b
. We assume that
a
has a high reward
r
a
,
but
Δ
p
a
¼
0, i.e., the transition to the product, occurs equally well without a
recommendation and brings high sales. Now let
b
be a product whose recommen-
dation is strongly accepted but which is associated with a low reward
r
b
. The reality
is
q
π
(
s
,
a
)
Δ
q
π
(
s
,
b
), but the simplification delivers the opposite:
>
q
π
s
ðÞq
π
s
p
a
r
a
Δ
p
b
r
b
¼
0
Δ
p
b
r
b
<
;
ðÞΔ
;
0
:
So the simplification
c
(
s
,
a
)
1 conceals the risk of down-selling; therefore, we
do not generally apply it in practice. If we ignore it, we obtain for our example
h
i
q
π
s
ðÞq
π
s
ðÞ¼p
ss
a
cs
p
b
r
b
¼ Δ
p
a
ðÞr
a
Δ
p
b
r
b
:
;
;
ð p
ss
a
;
r
a
Δ
|
{z
}
Δ
p
a
ðÞ
1
p
ss
b
1
p
ss
b
p
b
¼ p
ss
b
p
ss
b
, conversely,
cs
Since with increasing
Δ
ðÞ¼
;
decreases,
p
a
(
b
). The raising of the action value by the
increased acceptance of
b
works against its decrease by the reduced action potential
of
a
, which is reflected by the scaling factor
c
. The decision between recommen-
dations
a
and
b
is dependent on which of the two effects predominates.
We have therefore established that if
p
ss
a
and
p
ss
a
are both small, it is practical to
work with
c
(
s
,
a
)
¼
1 (even if
p
ss
a
is higher than
p
ss
a
by a multiple, or vice versa).
Otherwise we must work with the exact scaling factor
c
(
s
,
a
).
this leads in turn to an increasing
Δ