Information Technology Reference
In-Depth Information
n ∈N C
ʴ n , h ( f )( s ) .
h ( f )( s )
C
=
(4.19)
ʱ
−ₒ
We focus on the nontrivial case that s
ʔ for some action ʱ and distribution
ʔ
D
( S ).
n ∈N C
ʴ n , h ( f )( s )
n ∈N C
ʴ n , h ( f )( s )
=
n ∈N
=
ʴ n ·
f ( ʔ )
n ∈N
ʴ n
=
f ( ʔ )
·
=
f ( ʔ )
·
1
h ( f )( s ) .
= C
Lemma 4.7
Let h
[0, 1] ʩ
be a reward vector and
{ ʴ n } n 1 be a nondecreasing
sequence of discount factors converging to 1 .
= n ∈N V
h
ʴ n , h
V
max = n ∈N V
ʴ n , h
V
max .
h ; the case for
h
Proof
max is similar. We use the notation lfp ( f )
for the least fixed point of the function f over a complete lattice. Recall that
We only consider
V
V
h and
V
ʴ n , h are the least fixed points of
h and
ʴ n , h , respectively, so we need to prove that
V
C
C
n ∈N
h )
ʴ n , h ) .
lfp (
C
=
lfp (
C
(4.20)
We now show two inequations.
For any n
ʴ n , h
h . It follows
∈ N
,wehave ʴ n
1, so Lemma 4.6 (2) yields
C
C
h ), thus n ∈N
ʴ n , h )
ʴ n , h )
h ).
that lfp (
C
lfp (
C
lfp (
C
lfp (
C
n ∈N
h )
ʴ n , h ), it suffices to show that
For the other direction, that is lfp (
C
lfp (
C
n ∈N
ʴ n , h ) is a prefixed point of
h , i.e.
lfp (
C
C
h n ∈N
ʴ n , h )
n ∈N
ʴ n , h ),
C
lfp (
C
lfp (
C
which we derive as follows. Let
{ ʴ n } n 1 be a nondecreasing sequence of discount
factors converging to 1.
h n ∈N
ʴ n , h )
C
lfp (
C
m ∈N C
ʴ m , h n ∈N
ʴ n , h ) by Lemma 4.6 (3)
=
lfp (
C
ʴ m , h n ∈N
ʴ n , h )
m ∈N C
=
lfp (
C
m ∈N
n ∈N C
ʴ m , h ( lfp (
ʴ n , h ))
=
C
by Lemma 4.6 (1)
m ∈N
n m C
ʴ m , h ( lfp (
ʴ n , h ))
=
C
Search WWH ::




Custom Search