Information Technology Reference
In-Depth Information
Theorem 6.10
In a finitary pLTS, for any reward function
r
there exists a derivative
policy
dp
such that
1,
r
1,
dp
,
r
.
P
max
= P
Proof
Let
r
be a reward function. By Proposition
6.6
, for every discount factor
d<
1 there exists a max-seeking derivative policy
dp
with respect to
ʴ
and
r
such
that
ʴ
,
r
ʴ
,
dp
,
r
.
P
max
= P
(6.18)
Since the pLTS is finitary, there are finitely many different static derivative policies.
There must exist a derivative policy
dp
such that (
6.18
) holds for infinitely many dis-
count factors. In other words, for every nondecreasing sequence
ʴ
n
}
n
=
0
converging
{
ʴ
n
j
}
j
=
0
and a derivative policy
dp
such that
to 1, there exists a subsequence
{
ʴ
n
j
,
r
max
ʴ
n
j
,
dp
,
r
P
= P
for all
j
≥
0
.
(6.19)
For any state
s
, we infer as follows.
1,
max
(
s
)
P
·
ʔ
|
s
⃒
ʔ
}
=
sup
{
r
lim
j
ₒ∞
i
=
0
ʴ
n
j
(
r
ʔ
with
ʔ
=
i
=
0
ʔ
i
}
ʔ
i
=
sup
{
·
)
|
s
⃒
by Lemma
6.16
{
i
=
0
ʴ
n
j
(
r
ʔ
with
ʔ
=
i
=
0
ʔ
i
}
ʔ
i
=
lim
j
ₒ∞
sup
·
)
|
s
⃒
·
i
=
0
ʴ
n
j
ʔ
i
|
ʔ
with
ʔ
=
i
=
0
ʔ
i
}
=
{
⃒
lim
j
ₒ∞
sup
r
s
·
ʔ
|
s
⃒
ʴ
n
j
ʔ
}
=
lim
j
ₒ∞
sup
{
r
ʴ
n
j
,
r
max
(
s
)
=
lim
j
ₒ∞
P
ʴ
n
j
,
dp
,
r
(
s
)
=
lim
j
ₒ∞
P
(
6.19
)
1,
dp
,
r
(
s
)
= P
by Corollary
6.5
6.5.3
Consequences
In this section, we outline two major consequences of Theorem
6.9
, which informally
means that the set of weak derivatives from a given state is the convex-closure of a
finite set. The first is straightforward and is explained in the following two results.
Lem
m
a 6.17
(
Closure of
⇒
) For any state s in a finitary pLTS the set of derivatives
{
ʔ
|
s
⃒
ʔ
}
is closed and convex.
Proof
Let
dp
1
,
...
,
dp
n
(
n
≥
1) be all the derivative policies in the finitary pLTS.
ʔ
|
ʔ
}
Consider two sets
C
.By
Theorem
6.9
D
coincides with
C
, the convex closure of a finite set. By Lemma
6.9
,
it is also Cauchy closed.
={
Der
dp
i
(
s
)
|
1
≤
i
≤
n
}
and
D
={
s
⃒
The restriction here to finitary pLTSs is essential, as the following examples
demonstrate.
Search WWH ::
Custom Search