Database Reference
In-Depth Information
and its inverse through
p a :
p ½ ¼ F 1
Π a
ð 5
:
15 Þ
The inverse mapping requires the solution of an equation system which i s not
difficult to compute. Although ( 5.14 ) is formally nonlinear with respect to p ½ ,in
reality the solution can be done in a quite similar way. In fact, first we consider all
recommended successor products s 0
S a and calculate their corresponding condi-
tional probabilities p a s 0
ss 0 . This requires the solution of a linear equation system with
Sjj unknowns. Then we turn to the remaining, not-recommended successor states
s 0 2 S a and directly calculate their unconditional probabilities p ss 0 .
That way, we end up at Algorithm 5.2 for multiple recommendations which is
quite similar to Algorithm 5.1 of single recommendations.
Algorithm 5.2: Update of the internal from conditional probabilities for
multiple recommendations, linear mapping
Input: vector of in te rnal probabilities j p ½ and fixed probabilities j
Π a , delivered
recommendations a ¼ a 1 ,
ð
...
, a k
Þ , index of product tra n sition l , step size
α j
Output: updated vector of internal probabilities 1 p ½ and 1
Π a
1: proc e dure UPD A TE_P_DP_MULTI_LIN( j p ½ , j
Π a , a , l ,
α j )
j p a
j p ½
2:
¼ F j
conversion into conditional
probabilities
Π a
1 p a : ¼ U PD ATE_P_
SIN GL E( j p a , l ,
3:
α j )
update of conditional probabilities
1 p ½
F 1
j
1 p a
4:
conversion into internal probabilities
Π a
1
j
5:
Π a
Π a
unchanged take-over of the fixed
component
return ( 1 p ½ ,
1
6:
Π a )
7:
end procedure
We now turn to the action-value function. For complexity reasons we cannot
check all combinations of single recommendations in order to determine the greedy
policy. We need a more efficient approach.
At this, we plug in ( 4.3 ) into ( 3.5 ) and rewrite the action-value function for
multiple recommendations:
!
r ss 0 ¼
ðÞ¼ X
s 0
p ss 0 r ss 0 ¼ X
s 0
X
k X
X
k X
k
k
k
1
k
1
1
q π s
p a i
ss 0
p a i
q π s
;
ss 0 r ss 0 ¼
að :
;
1
1
s 0
1
This nice result tells us that in order to find the highest action value of
k recommendations, we just need to select the k recommendations of the highest single
action values. The same holds for the full action-value function ( 3.6 ).
Search WWH ::




Custom Search