Information Technology Reference
In-Depth Information
n
∈N
C
ʴ
n
,
h
(
f
)(
s
)
.
h
(
f
)(
s
)
C
=
(4.19)
ʱ
−ₒ
We focus on the nontrivial case that
s
ʔ
for some action
ʱ
and distribution
ʔ
∈
D
(
S
).
n
∈N
C
ʴ
n
,
h
(
f
)(
s
)
n
∈N
C
ʴ
n
,
h
(
f
)(
s
)
=
n
∈N
=
ʴ
n
·
f
(
ʔ
)
n
∈N
ʴ
n
=
f
(
ʔ
)
·
=
f
(
ʔ
)
·
1
h
(
f
)(
s
)
.
=
C
Lemma 4.7
Let h
∈
[0, 1]
ʩ
be a reward vector and
{
ʴ
n
}
n
≥
1
be a nondecreasing
sequence of discount factors converging to
1
.
=
n
∈N
V
h
ʴ
n
,
h
•
V
max
=
n
∈N
V
ʴ
n
,
h
•
V
max
.
h
; the case for
h
Proof
max
is similar. We use the notation
lfp
(
f
)
for the least fixed point of the function
f
over a complete lattice. Recall that
We only consider
V
V
h
and
V
ʴ
n
,
h
are the least fixed points of
h
and
ʴ
n
,
h
, respectively, so we need to prove that
V
C
C
n
∈N
h
)
ʴ
n
,
h
)
.
lfp
(
C
=
lfp
(
C
(4.20)
We now show two inequations.
For any
n
ʴ
n
,
h
h
. It follows
∈ N
,wehave
ʴ
n
≤
1, so Lemma
4.6
(2) yields
C
≤
C
h
), thus
n
∈N
ʴ
n
,
h
)
ʴ
n
,
h
)
h
).
that
lfp
(
C
≤
lfp
(
C
lfp
(
C
≤
lfp
(
C
≤
n
∈N
h
)
ʴ
n
,
h
), it suffices to show that
For the other direction, that is
lfp
(
C
lfp
(
C
n
∈N
ʴ
n
,
h
) is a prefixed point of
h
, i.e.
lfp
(
C
C
h
n
∈N
ʴ
n
,
h
)
n
∈N
ʴ
n
,
h
),
C
lfp
(
C
≤
lfp
(
C
which we derive as follows. Let
{
ʴ
n
}
n
≥
1
be a nondecreasing sequence of discount
factors converging to 1.
h
n
∈N
ʴ
n
,
h
)
C
lfp
(
C
m
∈N
C
ʴ
m
,
h
n
∈N
ʴ
n
,
h
)
by Lemma
4.6
(3)
=
lfp
(
C
ʴ
m
,
h
n
∈N
ʴ
n
,
h
)
m
∈N
C
=
lfp
(
C
m
∈N
n
∈N
C
ʴ
m
,
h
(
lfp
(
ʴ
n
,
h
))
=
C
by Lemma
4.6
(1)
m
∈N
n
≥
m
C
ʴ
m
,
h
(
lfp
(
ʴ
n
,
h
))
=
C
Search WWH ::
Custom Search