Information Technology Reference
In-Depth Information
,if
j
1
≤
j
2
then
ʴ
j
1
≤
ʴ
j
2
. It follows
1.
f
satisfies condition
C1
. For all
i
,
j
1
,
j
2
∈ N
that
ʴ
j
1
(
r
ʔ
i
ʴ
j
2
(
r
ʔ
i
|
f
(
i
,
j
1
)
|=|
·
)
|≤|
·
)
|=|
f
(
i
,
j
2
)
|
.
2.
f
satisfies condition
C2
. For any
i
∈ N
,wehave
ʔ
i
ʔ
i
|
ʴ
j
(
r
lim
j
ₒ∞
|
f
(
i
,
j
)
|=
lim
j
ₒ∞
|
·
)
|=|
r
·
.
(6.17)
3.
f
satisfies
condition
C3
.
For
any
n
∈ N
,
the
partial
sum
S
n
=
i
=
0
lim
j
ₒ∞
|
f
(
i
,
j
)
|
is bounded because
n
n
∞
∞
ʔ
i
|≤
ʔ
i
|≤
ʔ
i
|=|
ʔ
|
j
ₒ∞
|
lim
f
(
i
,
j
)
|=
0
|
r
·
0
|
r
·
0
|
i
=
0
i
=
i
=
i
=
where the first equality is justified by (
6.17
).
4.
f
satisfies condition
C4
. For any
i
,
j
1
,
j
2
∈ N
with
j
1
≤
j
2
, suppose we have
ʴ
j
1
(
r
ʔ
i
ʔ
i
f
(
i
,
j
1
)
=
·
)
>
0. Then
r
·
>
0 and it follows immediately that
ʔ
i
ʴ
j
2
(
r
f
(
i
,
j
2
)
=
·
)
>
0.
Therefore, we can use Proposition 2.2 to do the following inference.
lim
j
ₒ∞
i
=
0
ʴ
j
(
r
ʔ
i
)
·
=
i
=
0
lim
j
ₒ∞
ʴ
j
(
r
ʔ
i
·
)
=
i
=
0
r
ʔ
i
·
·
i
=
0
ʔ
i
=
r
ʔ
=
r
·
ʴ
j
}
j
=
0
be a nondecreasing sequence of discount factors con-
verging to
1
. For any derivative policy
dp
and reward function
r
, it holds that
P
Corollary 6.5
Let
{
1,
dp
,
r
=
lim
j
ₒ∞
P
ʴ
j
,
dp
,
r
.
ʴ
j
,
dp
,
r
(
s
), for any state
s
. Note
th
at for any discount
ʴ
j
, each state
s
enables a unique discounted weak derivation
s
⃒
ʴ
j
,
dp
ʔ
j
1,
dp
,
r
(
s
)
Proof
We need to show that
P
=
lim
j
ₒ∞
P
=
i
=
0
ʴ
j
ʔ
i
such that
ʔ
j
for some properly related subdistributions
. Let
ʔ
=
i
=
0
ʔ
i
ʔ
i
⃒
1,
dp
ʔ
. Then we can infer that
.Wehave
s
lim
j
ₒ∞
P
ʴ
j
,
dp
,
r
(
s
)
ʔ
j
=
lim
j
ₒ∞
r
·
·
i
=
0
ʴ
j
ʔ
i
=
lim
j
ₒ∞
r
lim
j
ₒ∞
i
=
0
ʴ
j
(
r
ʔ
i
)
=
·
ʔ
=
r
·
by Lemma
6.16
1,
dp
,
r
(
s
)
= P
Search WWH ::
Custom Search