Database Reference
In-Depth Information
a
b
v
( s 1
)
q ( s , a 1 )
π
(
s
,
a
)
a
s r 1
1
p
a
ss
1
q ( s , a )
q ( s , a 2 )
v
( s 2
)
v ( s )
s
a
ss
r
p
a
ss
π
(
s
,
a
)
(
s
,
a
)
2
2
2
a
ss n
p
π
(
s
,
a
)
m
a
ss n
r
v
(
s
)
q
(
s
,
a
)
m
n
Fig. 3.3 Relationship between the state-value and action-value functions v and q
a
b
q
(
s
,
a
)
v (
s
)
1
11
r
a
ss
1
1
1
π
(
s
,
a
)
1
11
a
ss
p
1
q
(
s
,
a
)
1
v
( 2
s
)
1
12
r
a
ss
a
1
2
s r 1
π
(
s
,
a
)
p
a
ss
1
s
1
12
2
1
(
s
,
a
)
a
ss n
1
p
1
1
π
(
s
,
a
)
π
(
s
,
a
)
1
1
m
r
a
ss n
1
1
q
(
s
,
a
)
v
(
s
)
1
n
1
m
1
1
1
a
ss
p
a
ss
v
( 1
s
)
q
(
s
,
a
)
r
2
1
2
21
1
π
(
s
,
a
)
v
( s
)
2
21
q
(
s
,
a
)
p
a
ss
2
1
a
ss
p
v
( 2
s
)
q
(
s
,
a
)
2
22
r
a
ss
2
2
a
ss
p
2
2
π
(
s
,
a
)
(
s
,
a
)
2
s
a
ss
2
22
π
(
s
,
a
)
r
2
2
p
a
ss n
2
s
(
s
,
a
)
2
2
r
a
ss n
2
π
(
s
,
a
)
2
v
(
s
)
n
2
2
m
2
2
q
(
s
,
a
)
2
2
2
m
2
p
a
ss n
q
(
s
n a
,
)
v
( 1
s
)
a
n
1
r
m
π
(
s
,
a
)
ss
π
(
s
n a
,
)
1
n
1
p
a
ss
m
m
r
a
ss n
1
q
(
s
n a
,
)
v
( 2
s
)
n
2
r
a
a
ss
m
p
ss
π
(
s
n a
,
)
m
2
s
(
s
,
a
)
n
2
2
n
m
a
ss
p
m
n
m
π
(
s
n a
,
)
r
a
ss
nm
m
v
(
s
)
n
q
(
s
n a
,
)
n
m
n
nm
m
n
Fig. 3.4 Bellman equation for both the action-value function q and the state-value function v
Since to q π ( s 0 , a 0 ), too, a Bellman equation in accordance with ( 3.6 ) applies, with
new subsequent states s 00 and actions a 00 (which in part can contain the original s and
a !), the solution of ( 3.6 ) - unlike the 1-step special case ( 3.5 ) - is usually a more
complex undertaking. This reflects the fact that we are taking into account the entire
chain of subsequent transactions by which we address the Problem 4 in Chap. 2 .
Since our transition probabilities p ss 0 in fact depend on the action a , we learn directly
from this and thus also solve Problem 1 in Chap. 2 .
For the sake of completeness, we should also mention that in ( 3.4 ), we can
conversely eliminate the action-value function q π . We then obtain the Bellman
equation for the state-value function (Fig. 3.4b ):
v π ðÞ¼ X
h
v π s 0
i
ð X
s 0
p ss 0
r ss 0 þ γ
a π
s
;
:
ð 3
:
7 Þ
Search WWH ::




Custom Search