Database Reference
In-Depth Information
and its inverse through
p
a
:
p
½
¼
F
1
Π
a
ð
5
:
15
Þ
The inverse mapping requires the solution of an equation system which i
s
not
difficult to compute. Although (
5.14
) is formally nonlinear with respect to p
½
,in
reality the solution can be done in a quite similar way. In fact, first we consider all
recommended successor products
s
0
∈
S
a
and calculate their corresponding condi-
tional probabilities
p
a
s
0
ss
0
. This requires the solution of a linear equation system with
Sjj
unknowns. Then we turn to the remaining, not-recommended successor states
s
0
2 S
a
and directly calculate their unconditional probabilities
p
ss
0
.
That way, we end up at Algorithm 5.2 for multiple recommendations which is
quite similar to Algorithm 5.1 of single recommendations.
Algorithm 5.2: Update of the internal from conditional probabilities for
multiple recommendations, linear mapping
Input: vector of in
te
rnal probabilities
j
p
½
and fixed probabilities
j
Π
a
, delivered
recommendations
a ¼ a
1
,
ð
...
,
a
k
Þ
, index of product tra
n
sition
l
, step size
α
j
Output: updated vector of internal probabilities
jþ
1
p
½
and
jþ
1
Π
a
1: proc
e
dure UPD
A
TE_P_DP_MULTI_LIN(
j
p
½
,
j
Π
a
,
a
,
l
,
α
j
)
j
p
a
j
p
½
2:
¼
F
j
⊳
conversion into conditional
probabilities
Π
a
jþ
1
p
a
:
¼
U
PD
ATE_P_
SIN
GL
E(
j
p
a
,
l
,
3:
α
j
)
⊳
update of conditional probabilities
jþ
1
p
½
:¼
F
1
j
jþ
1
p
a
4:
⊳
conversion into internal probabilities
Π
a
jþ
1
j
5:
Π
a
:¼
Π
a
⊳
unchanged take-over of the fixed
component
return (
jþ
1
p
½
,
jþ
1
6:
Π
a
)
7:
end procedure
We now turn to the action-value function. For complexity reasons we cannot
check all combinations of single recommendations in order to determine the greedy
policy. We need a more efficient approach.
multiple recommendations:
!
r
ss
0
¼
ðÞ¼
X
s
0
p
ss
0
r
ss
0
¼
X
s
0
X
k
X
X
k
X
k
k
k
1
k
1
1
q
π
s
p
a
i
ss
0
p
a
i
q
π
s
;
ss
0
r
ss
0
¼
að :
;
i¼
1
i¼
1
s
0
i¼
1
This nice result tells us that in order to find the highest action value of
k
recommendations, we just need to select the
k
recommendations of the highest single
action values. The same holds for the full action-value function (
3.6
).