Information Technology Reference
In-Depth Information
ʴ , r
P
max ( s )
ʔ |
ʴ ʔ }
=
sup
{
r
·
s
˄
−ₒ
( ʔ 0 +
ʔ 0 , ʔ 0
ʔ )
ʔ 0
ʔ 1 , and ʔ 1 ʴ ʔ
for some ʔ 0
=
sup
{
r
·
ʴ
·
|
s
=
+
, ʔ 0 , ʔ 1 , ʔ }
˄
−ₒ
ʔ 0 +
ʔ 0 , ʔ 0
ʔ |
ʔ 0
ʔ 1 , and ʔ 1 ʴ ʔ
for some ʔ 0 , ʔ 0 , ʔ 1 , ʔ }
=
sup
{
r
·
ʴ
·
r
·
s
=
+
˄
−ₒ
ʔ 0 +
ʔ 0 and ʔ 0
=
{
·
·
{
·
ʔ |
ʔ 1 ʴ ʔ }|
=
ʔ 0
+
sup
r
ʴ
sup
r
s
ʔ 1
for some ʔ 0
, ʔ 0 , ʔ 1 }
˄
−ₒ
ʔ 0 +
ʔ 0
ʴ , max ( ʔ 1 )
ʔ 0
and ʔ 0
=
sup
{
r
·
ʴ
· P
|
s
=
+
ʔ 1
for some ʔ 0 , ʔ 0 , ʔ 1 }
˄
−ₒ
=
{
·
+
· P
ʴ , max ( ʔ 1 )
|
ʔ 1 for some ʔ 1 }
sup
(1
p )
r ( s )
p
[0, 1] and s
[ s can be split into ps
+
(1
p ) s only]
˄
−ₒ
ʴ , max ( ʔ 1 )
=
sup
{
(1
p )
·
r ( s )
+
· P
|
p
[0, 1] and s
ʔ 1
for some ʔ 1 }
˄
−ₒ
=
sup
{
(1
p )
·
r ( s )
+
·
sup
{P
ʴ , max ( ʔ 1 )
|
s
ʔ 1 }|
p
[0, 1]
}
˄
−ₒ
ʴ , r
=
max ( r ( s ), ʴ
·
sup
{P
max ( ʔ 1 )
|
s
ʔ 1 }
)
ʴ , max ( ʔ )
=
ʴ
· P
[as dp is max-seeking]
F ʴ , dp , r (
ʴ , max )( s )
=
P
Definition 6.17 Let ʔ be a subdistribution and d p a static derivative policy. We
define a collection of subdistributions ʔ k as follows.
ʔ 0 = ʔ
ʔ k + 1 = {
ʔ k ( s )
·
dp ( s )
|
s
ʔ k
and dp ( s )
↓}
for all k
0 .
Then ʔ k
is obtained from ʔ k by letting
0
if dp ( s )
ʔ k ( s )
=
ʔ k ( s )
otherwise
ʴ , dp ʔ for the discounted weak derivation that
for all k
0. Then we write ʔ
determines a unique subdistribution ʔ with ʔ = k = 0 ʴ k ʔ k .
In other words, if ʔ
ʴ , dp ʔ then ʔ comes from the discounted weak derivation
ʴ ʔ that is constructed by following the derivative policy d p when choosing
˄ transitions from each state. In the special case when the discount factor ʴ
ʔ
=
1, we
see that
1, dp becomes
dp as defined in page 176.
 
Search WWH ::




Custom Search