Database Reference
In-Depth Information
where
k
is the number of all the other states
s
0
6¼ s
a
. This is the most complicated
case. The above equation follows from the elementary limit value consideration
with the equation
p
ss
0
¼
1
p
ss
a
k
8s
0
6¼ s
a
:
1
X
X
k
X
s
0
6¼s
a
1
p
ss
a
1
p
ss
a
1
p
ss
a
1
p
ss
a
1
p
ss
a
k
1
p
ss
a
k
r
ss
0
¼
1
p
ss
a
lim
p
ss
a
!
1
r
ss
0
¼
lim
p
ss
a
!
1
r
ss
0
:
s
0
6¼s
a
s
0
6¼s
a
Of course, the distribution of the unconditional probabilities
p
ss
0
may also be
modeled differently (which may lead to a different result), but this is the most
natural approach. The interpretation is since in the recommendation-free case, the
transition always leads to
s
a
, the distribution of the
p
ss
0
is unknown if, in the
recommendation case, the transition leads to
s
0
6¼ s
a
. Therefore, the
p
ss
0
are
assumed to be equal. Hence, the action value corresponds to the conditional action
value of recommendation
a
plus the probability of nonacceptance of the recom-
mendation times the average reward over the other states
s
0
6¼ s
a
.
The approach described for estimation of the transition probabilities in (3.5)
resultweagainobtainanequationsimilarto(
5.6
), albeit of course more
complex:
"
#
X
q
π
s
ðÞ¼p
ss
a
p
ss
a
þ γ
a
0
Þq
π
s
a
;
a
0
;
π
ð
s
a
;
ð
Þ
a
0
"
#
ð
X
s
0
6¼s
a
X
p
ss
0
r
ss
0
þ γ
s
0
a
0
Þq
π
s
0
a
0
þ cs
;
π
ð
;
ð
;
Þ
:
ð
5
:
9
Þ
a
0
This is then solved in realtime once again using ADP. Concerning a corresponding
TD algorithm, it is not easy to derive because of the nonlinearity of (
5.8
)withrespect
to
p
ss
a
. We leave this as an open problem. Nevertheless, we will also consider the
5.2.3 Estimation of Transition Probabilities
The conditional transition probabilities
p
ss
a
may be computed from the transactions
according to Algorithm 4.1 or, in case that multiple recommendations have been
issued, Algorithm 4.2.
Computation of the unconditional transition probabilities
p
ss
0
has not yet been
addressed. It may be calculated either from sessions in the control group (the topic