Database Reference
In-Depth Information
where k is the number of all the other states s 0 6¼ s a . This is the most complicated
case. The above equation follows from the elementary limit value consideration
with the equation p ss 0 ¼
1 p ss a
k
8s 0
6¼ s a :
1
X
X
k X
s 0 6¼s a
1 p ss a
1 p ss a
1 p ss a
1 p ss a
1 p ss a
k
1 p ss a
k
r ss 0 ¼ 1 p ss a
lim
p ss a ! 1
r ss 0 ¼ lim
p ss a ! 1
r ss 0 :
s 0 6¼s a
s 0 6¼s a
Of course, the distribution of the unconditional probabilities p ss 0 may also be
modeled differently (which may lead to a different result), but this is the most
natural approach. The interpretation is since in the recommendation-free case, the
transition always leads to s a , the distribution of the p ss 0 is unknown if, in the
recommendation case, the transition leads to s 0 6¼ s a . Therefore, the p ss 0 are
assumed to be equal. Hence, the action value corresponds to the conditional action
value of recommendation a plus the probability of nonacceptance of the recom-
mendation times the average reward over the other states s 0 6¼ s a .
The approach described for estimation of the transition probabilities in (3.5)
can now be applied similarly for a positive discount rate γ ( 3.6 ) , so that in the
resultweagainobtainanequationsimilarto( 5.6 ), albeit of course more
complex:
"
#
X
q π s
ðÞ¼p ss a
p ss a þ γ
a 0
Þq π s a ;
a 0
;
π
ð
s a ;
ð
Þ
a 0
"
#
ð X
s 0 6¼s a
X
p ss 0
r ss 0 þ γ
s 0
a 0
Þq π s 0
a 0
þ cs
;
π
ð
;
ð
;
Þ
:
ð 5
:
9 Þ
a 0
This is then solved in realtime once again using ADP. Concerning a corresponding
TD algorithm, it is not easy to derive because of the nonlinearity of ( 5.8 )withrespect
to p ss a . We leave this as an open problem. Nevertheless, we will also consider the
TD-Version in the course of this topic, especially in Chaps. 6 and 10 .
5.2.3 Estimation of Transition Probabilities
The conditional transition probabilities p ss a may be computed from the transactions
according to Algorithm 4.1 or, in case that multiple recommendations have been
issued, Algorithm 4.2.
Computation of the unconditional transition probabilities p ss 0 has not yet been
addressed. It may be calculated either from sessions in the control group (the topic
Search WWH ::




Custom Search