Database Reference
In-Depth Information
We designate the corresponding approach using unconditional transition probabil-
ities as the unconditional or probabilistic approach in contrast to the conditional
approach using conditional probabilities. Below, we will derive expressions for
both approaches and then combine them in a useful form.
5.1 Unconditional Approach
The probabilistic approach thus corresponds to the classical view described by
Approach 1 in Chap. 2 . For this, we apply RL approaches formally, so as, for
instance, to factor in rewards and chain optimization, i.e., we use the formal
advantages of RL in order to broaden the classical approach. Thus, instead of
recommending a product s 0 , which after viewing s is bought most frequently, we
incorporate its reward r . That seems logical. In addition, we give preference to those
products which, including subsequent purchases by existing customers, lead to the
highest sales. So, chain optimization seems reasonable as well.
In the strict RL sense, that is nonsensical. We only learn the policy which users
pursue on their own initiative without regard to the recommendations, and we
reinforce this. But as we have seen, the generalized general policy iteration at
least offers a general justification of this approach. And the results are good in
practice.
The approach thus tends to implement the recommendations with the highest
unconditional transition probabilities and action values. Let us start with the
“simple” case ( 3.5 ). The direct use of ( 3.5 ) would take us no further forward
here, since
ðÞ¼ X
s 0
p ss 0 r ss 0 ¼ X
s 0
q π s
p ss 0 r ss 0 ¼ q 0
ðÞ¼q 0
;
s
;
ðÞ
would yield the same action value for all recommendations a . In fact, if the
transition probabilities were independent of issuing recommendations, one
would not need any recommendations at all. Therefore, we make the following
assumption:
Assumption 5.1 (Unconditional probability property): For each state
transition from s to the state s a associated with the action a , the transition proba-
bility p ss a is considered as proportional to the unconditional probability p ss a ,i.e.,
p ss a ¼ dp ss a 0
1. For any other state transition from s to s 0 under
the action a , the transition probability p ss 0 is likewise considered as proportional to
the unconditional probability p ss 0 ,i.e., p ss 0 ¼ cp ss 0 , but with the factor c
with the factor d
>
<
1.
In other words, delivering recommendation a increases the probability of the
transition for the product s a associated with it. The higher the unconditional
Search WWH ::




Custom Search