Information Technology Reference
In-Depth Information
,if j 1 j 2 then ʴ j 1 ʴ j 2 . It follows
1. f satisfies condition C1 . For all i , j 1 , j 2 ∈ N
that
ʴ j 1 ( r
ʔ i
ʴ j 2 ( r
ʔ i
|
f ( i , j 1 )
|=|
·
)
|≤|
·
)
|=|
f ( i , j 2 )
|
.
2. f satisfies condition C2 . For any i
∈ N
,wehave
ʔ i
ʔ i |
ʴ j ( r
lim
j
ₒ∞ |
f ( i , j )
|=
lim
j
ₒ∞ |
·
)
|=|
r
·
.
(6.17)
3. f
satisfies
condition
C3 .
For
any n
∈ N
,
the
partial
sum S n
=
i = 0 lim j ₒ∞ |
f ( i , j )
|
is bounded because
n
n
ʔ i |≤
ʔ i |≤
ʔ i |=|
ʔ |
j ₒ∞ |
lim
f ( i , j )
|=
0 |
r
·
0 |
r
·
0 |
i
=
0
i
=
i
=
i
=
where the first equality is justified by ( 6.17 ).
4. f satisfies condition C4 . For any i , j 1 , j 2 ∈ N
with j 1 j 2 , suppose we have
ʴ j 1 ( r
ʔ i
ʔ i
f ( i , j 1 )
=
·
) > 0. Then r
·
> 0 and it follows immediately that
ʔ i
ʴ j 2 ( r
f ( i , j 2 )
=
·
) > 0.
Therefore, we can use Proposition 2.2 to do the following inference.
lim j ₒ∞ i = 0 ʴ j ( r
ʔ i )
·
= i = 0 lim j ₒ∞ ʴ j ( r
ʔ i
·
)
= i = 0 r
ʔ i
·
· i = 0 ʔ i
=
r
ʔ
=
r
·
ʴ j } j = 0 be a nondecreasing sequence of discount factors con-
verging to 1 . For any derivative policy dp and reward function r , it holds that
P
Corollary 6.5
Let
{
1, dp , r
=
lim j ₒ∞ P
ʴ j , dp , r .
ʴ j , dp , r ( s ), for any state s . Note
th at for any discount ʴ j , each state s enables a unique discounted weak derivation
s ʴ j , dp ʔ j
1, dp , r ( s )
Proof
We need to show that
P
=
lim j ₒ∞ P
= i = 0 ʴ j ʔ i
such that ʔ j
for some properly related subdistributions
. Let ʔ = i = 0 ʔ i
ʔ i
1, dp ʔ . Then we can infer that
.Wehave s
lim j ₒ∞ P
ʴ j , dp , r ( s )
ʔ j
=
lim j ₒ∞ r
·
· i = 0 ʴ j ʔ i
=
lim j ₒ∞ r
lim j ₒ∞ i = 0 ʴ j ( r
ʔ i )
=
·
ʔ
=
r
·
by Lemma 6.16
1, dp , r ( s )
= P
Search WWH ::




Custom Search