Database Reference
In-Depth Information
p a . That is, those recommenda-
tions which most increase the transition probability. This is also logical, since the
unconditional transition probabilities p a are in fact achieved even without recommen-
dations. By the way, it can be generally assumed that in most cases the relationship
conditional and unconditional transition probability
Δ
p a
p a
Δ
has an approximate validity, which is why classical data mining methods
work to some extent (see also previous section:
p a
p ss a p ss a
Δ
¼
¼ dp ss a p ss a
¼ d ð Þp ss a p a ).
If we now discard the restriction regarding the 1-rewards, then, according to ( 5.7 ),
recommendation a will then certainly be better than recommendation b if the
difference in action value
p a r a for the transition probability increased by the product
recommendation is higher than the difference in action value
Δ
p b r b due to recom-
mendation b. This is an extension of the preceding case and its content is clear.
It should, however, be remembered that even the simplification c ( s , a ) 1 is not
always valid in practice and, moreover, is not always useful. In order to understand
this, consider just our two products a and b . We assume that a has a high reward r a ,
but
Δ
p a
¼ 0, i.e., the transition to the product, occurs equally well without a
recommendation and brings high sales. Now let b be a product whose recommen-
dation is strongly accepted but which is associated with a low reward r b . The reality
is q π ( s , a )
Δ
q π ( s , b ), but the simplification delivers the opposite:
>
q π s
ðÞq π s
p a r a Δ
p b r b ¼ 0 Δ
p b r b <
;
ðÞΔ
;
0
:
So the simplification c ( s , a ) 1 conceals the risk of down-selling; therefore, we
do not generally apply it in practice. If we ignore it, we obtain for our example
h
i
q π s
ðÞq π s
ðÞ¼p ss a cs
p b r b ¼ Δ
p a ðÞr a Δ
p b r b :
;
;
ð p ss a
;
r a Δ
| {z }
Δ
p a ðÞ
1 p ss b
1 p ss b
p b
¼ p ss b p ss b , conversely, cs
Since with increasing
Δ
ðÞ¼
;
decreases,
p a ( b ). The raising of the action value by the
increased acceptance of b works against its decrease by the reduced action potential
of a , which is reflected by the scaling factor c . The decision between recommen-
dations a and b is dependent on which of the two effects predominates.
We have therefore established that if p ss a and p ss a are both small, it is practical to
work with c ( s , a ) ¼ 1 (even if p ss a is higher than p ss a by a multiple, or vice versa).
Otherwise we must work with the exact scaling factor c ( s , a ).
this leads in turn to an increasing
Δ
Search WWH ::




Custom Search