Information Technology Reference
In-Depth Information
Theorem 6.10 In a finitary pLTS, for any reward function r there exists a derivative
policy dp such that
1, r
1, dp , r .
P
max = P
Proof Let r be a reward function. By Proposition 6.6 , for every discount factor
d< 1 there exists a max-seeking derivative policy dp with respect to ʴ and r such
that
ʴ , r
ʴ , dp , r .
P
max = P
(6.18)
Since the pLTS is finitary, there are finitely many different static derivative policies.
There must exist a derivative policy dp such that ( 6.18 ) holds for infinitely many dis-
count factors. In other words, for every nondecreasing sequence
ʴ n } n = 0 converging
{
ʴ n j } j = 0
and a derivative policy dp such that
to 1, there exists a subsequence
{
ʴ n j , r
max
ʴ n j , dp , r
P
= P
for all j
0 .
(6.19)
For any state s , we infer as follows.
1, max ( s )
P
· ʔ | s ʔ }
=
sup
{
r
lim j ₒ∞ i = 0 ʴ n j ( r
ʔ with ʔ = i = 0 ʔ i }
ʔ i
=
sup
{
·
)
|
s
by Lemma 6.16
{ i = 0 ʴ n j ( r
ʔ with ʔ = i = 0 ʔ i }
ʔ i
=
lim j ₒ∞ sup
·
)
|
s
· i = 0 ʴ n j ʔ i |
ʔ with ʔ = i = 0 ʔ i }
=
{
lim j ₒ∞ sup
r
s
· ʔ | s ʴ n j
ʔ }
=
lim j ₒ∞ sup
{
r
ʴ n j , r
max ( s )
=
lim j ₒ∞ P
ʴ n j , dp , r ( s )
=
lim j ₒ∞ P
( 6.19 )
1, dp , r ( s )
= P
by Corollary 6.5
6.5.3
Consequences
In this section, we outline two major consequences of Theorem 6.9 , which informally
means that the set of weak derivatives from a given state is the convex-closure of a
finite set. The first is straightforward and is explained in the following two results.
Lem m a 6.17 ( Closure of
) For any state s in a finitary pLTS the set of derivatives
{
ʔ
|
s
ʔ
}
is closed and convex.
Proof
Let dp 1 , ... , dp n ( n
1) be all the derivative policies in the finitary pLTS.
ʔ |
ʔ }
Consider two sets C
.By
Theorem 6.9 D coincides with C , the convex closure of a finite set. By Lemma 6.9 ,
it is also Cauchy closed.
={
Der dp i ( s )
|
1
i
n
}
and D
={
s
The restriction here to finitary pLTSs is essential, as the following examples
demonstrate.
 
Search WWH ::




Custom Search