Information Technology Reference
In-Depth Information
s
,
, is generable by the convex closure of a finite set. The proof makes
use of the separation theorem, Theorem 2.7.
{
ʔ
|
s
⃒
ʔ
}
Theorem 6.9 (Finite Generability)
Let P
be the finite set of
deriva
ti
ve policies in a finitary pLTS. For any state s and subdistribution ʔ in the
pLTS, s
={
dp
1
,
...
,
dp
k
}
⃒
ʔ if and only if ʔ
∈{
Der
dp
i
(
s
)
|
1
≤
i
≤
k
}
.
Proof
(
⃒
) For convenience, l
e
t
X
denote the set
{
Der
dp
i
(
s
)
|
1
≤
i
≤
k
}
. Sup-
pose, for a contradiction, that
s
ʔ
for some
ʔ
not in
X
. Recall that we are
assuming that the underlying state space is
S
⃒
={
x
1
,
...x
n
}
so that
X
is a subset of
1, 1]
n
, and by definition it is convex; by Lemma
6.9
it follows that
X
is (Cauchy) closed.
By the separation theorem,
ʔ
can be separated from
X
by a hyperplane
H
. What
this means is that there is some function
r
H
:
S
n
. It is trivially bounded by [
R
−
ₒ R
and constant
c
∈ R
such that
either
(a)
r
H
·
ʘ<c
for all
ʘ
∈
X
and
r
H
·
ʔ>c
or,
(b)
r
H
·
ʘ>c
for all
ʘ
∈
X
and
r
H
·
ʔ<c
In fact, from case (b) we can obtain case (a) by negating both the constant
c
and
the components of the function
r
H
; so we can assume (a) to be true. Moreover, by
scaling with respect to the largest
r
H
(
s
i
),
≤
≤
1
i
n
, we can assume that
r
H
is
actually a reward function.
In particular (a) means that
r
H
·
Der
dp
i
(
s
)
<c
, and therefore that
r
H
·
Der
dp
i
(
s
)
<
r
H
·
ʔ
for each of derivative polices
dp
i
. But this contradicts Theorem
6.8
which claims
that there must be some 1
r
max
(
s
)
≤
i
≤
n
such that
r
H
·
Der
dp
i
(
s
)
= P
≥
r
H
·
ʔ
.
(
⃐
) Note that by definition
s
⃒
Der
dp
i
(
s
) for every derivative policy
dp
i
with
1
≤
i
≤
k
. By Proposition
6.3
and Remark
6.1
,
the relation
⃒
is convex. It follows
that if
ʔ
Extreme policies, as given in Definition 4.8, are particular kinds of derivative
policies, designed for pLTSs of the form
∈{
Der
dp
1
(
s
),
...
,
Der
dp
k
}
, then
s
⃒
ʔ
.
R
,
ʩ
˄
,
ₒ
R
. The significant constraint on
˄
−ₒ
extreme policies is that for any state
s
if
s
then
ep
(
s
) must be defined. As a
consequence, in the computation determined by
ep
if a state can contribute to the
computation at any stage it must contribute.
Lemma 6.11
Let
ep
be any extreme policy. Then s
⃒
ep
ʔ implies s
⃒
ʔ.
Proof
Consider the derivation of
ʔ
as in Definition
6.4
, and determined by the
extreme policy
ep
. Since
ʔ
=
k
≥
0
ʔ
k
it is sufficient to show that each
ʔ
k
is
˄
−ₒ
implies
s
∈
ʔ
k
stable, that is
s
.
Since
ep
is an extreme policy, Definition 4.8 ensures that
ep
(
s
) is defined. From
the definition of a computation, Definition
6.4
, we know
ʔ
k
=
ʔ
k
ʔ
k
+
and
since the computation is guided by the policy
ep
we have that
ʔ
k
(
s
)
=
ʔ
k
(
s
). An
immediate consequence is that
ʔ
k
(
s
)
=
0.
Search WWH ::
Custom Search