Database Reference
In-Depth Information
probability p ss a , the higher the conditional probability p ss a . The transition probabil-
ities p ss 0 for all other products s 0 are, conversely, influenced negatively by delivering
a , since ( 3.2 ) applies. Of course, our probability property is of a somewhat abstract
nature, since, because of the equation system being strongly overdetermined, c and
d cannot be uniquely determined in general. Nevertheless, it is helpful for qualita-
tive discussion.
Thus ( 3.5 ) takes the following form:
ðÞ¼p ss a r ss a þ X
s 0 6¼s a
p ss 0 r ss 0 ¼ dp ss a r ss a þ X
s 0 6¼s a
q π s
;
cp ss 0 r ss 0
and yields
q π s
q π s
ðÞ
;
ðÞ¼ d c
;
ð
Þ p ss a r ss a p ss b r ss b
>
0 , p ss a r ss a >
p ss b r ss b :
The formula for calculating the action value can be derived immediately from
this:
q P
ðÞ¼p ss a r ss a ,
s
;
ð 5
:
1 Þ
which we will refer to as the (simplified ) P-Version below. A recommendation is
thus strong if it is either frequently clicked on, or carries a high reward, or both.
Approach ( 5.1 ) may now be expanded for case
γ >
0 in accordance with ( 3.6 ),
whereupon we obtain the full P-Version:
q P
:
p ss a X
a 0
q P
a 0
a 0
ðÞ¼p ss a r ss a þ γ
s
;
π
s a ;
s a ;
ð 5
:
2 Þ
As described in Chap. 3 , we can now update p ss a and r ss a in realtime and thus
calculate ( 5.1 ) and ( 5.2 ) either in an off-line fashion or ( 5.1 ) directly online or ( 5.2 )
online using ADP methods like Algorithm 3.3.
Alternatively, for the model-free case, we can very easily apply the TD-Version
in a similar way, although we have to employ a few empirical tricks to overcome
the problem of multiple recommendations. In practice, the unconditional approach
works quite successfully; the P-Version works better than the TD-Version.
Example 5.1 Subsequently, we shall illustrate the results of the unconditional
approach by means of a practical example. Here, we shall employ the online
verification methods described in Sect. 4.4 . We forgo the chain property, i.e., we
assign
γ ¼ 0. Thus, we use the simple P-Version according to ( 5.1 ) with an
adaptive update of the transition probabilities p ss a and rewards r ss a . To observe
unbiased user behavior, only transactions of sessions belonging to the control group
have been included in the analysis.
Search WWH ::




Custom Search