Information Technology Reference
In-Depth Information
˄
−ₒ
s
k
+
1
•
s
k
0
2
k
↕
ˉ.
0
.
˄
−ₒ
•
s
k
max
(
s
k
)tobe
h
(
ˉ
) for every
k
, and a max-seeking extreme
policy is determined by
ep
(
s
k
)
We have calculated
V
s
k
+
1
; indeed this is essentially the only such
policy. However the resolution associated with this policy does not realise
=
max
,as
V
ep
(
s
k
)
V
0.
Nevertheless, we will show that if we restrict attention to finitary pLTSs, then
there will always exist some static resolution that realises
=
max
. The proof relies on
techniques used in Markov process theory [
9
], and unlike that of Proposition
4.5
is
nonconstructive; we simply prove that some such resolution exists, without actually
showing how to construct it. Although such techniques are relatively standard in
the theory of Markov decision processes, see [
9
] for example, they are virtually
unknown in concurrency theory. So we give a detailed exposition that cumulates in
Theorem
4.3
.
Consider a set of all functions from a finite set
R
to [0, 1], denoted by [0, 1]
R
, and
the distance function
d
over [0, 1]
R
V
defined by
d
(
f
,
g
)
=
max
|
f
(
r
)
−
g
(
r
)
|
r
∈
R
.We
can check that ([0, 1]
R
,
d
) constitutes a complete metric space. Let
ʴ
∈
(0, 1) be a
h
discount factor. The discounted version of the function
C
in (
4.10
),
ʴ
,
h
C
:(
R
ₒ
[0, 1])
ₒ
(
R
ₒ
[0, 1]) defined by
⊧
⊨
ˉ
−ₒ
h
(
ˉ
)
if
r
ʴ
,
h
(
f
)(
r
)
ˉ
˄
C
=
(4.18)
0
if
r
−ₒ
and
r
−ₒ
⊩
˄
−ₒ
ˉ
·
−ₒ
ʴ
Exp
ʔ
(
f
) f
r
and
r
ʔ
is a contraction mapping with constant
ʴ
. It follows from the Banach fixed-point
theorem (cf. Theorem 2.8) that
ʴ
,
h
C
has a unique fixed point when
ʴ<
1, which we
ʴ
,
h
. On the other hand, it can be shown that
ʴ
,
h
is a continuous function
denote by
V
C
over the complete lattice [0, 1]
R
.So
ʴ
,
h
, as the least fixed point of
ʴ
,
h
, has the
V
C
=
n
∈N
V
ʴ
,
h
ʴ
,
h
,
n
, where
ʴ
,
h
,
n
is the
n
th iteration of
ʴ
,
h
over
characterisation
V
V
C
↥
.
ʴ
,
h
ʴ
,
h
Note that if there is no discount, i.e.
ʴ
=
1, we see that
C
and
V
coincide with
ʴ
,
h
min
and
h
and
h
, respectively. Similarly, we can define
ʴ
,
max
.
C
V
V
V
ʴ
,
h
ʴ
,
max
have the following properties.
The functions
C
and
V
[0, 1]
ʩ
Lemma 4.6
Let h
∈
be any reward vector.
ʴ
,
h
ʴ
,
h
1. For any ʴ
∈
(0, 1]
, the functions
C
and
C
max
are continuous;
ʴ
1
,
h
ʴ
2
,
h
and
ʴ
1
,
h
max
ʴ
2
,
h
2. If ʴ
1
,
ʴ
2
∈
(0, 1]
and ʴ
1
≤
ʴ
2
, then we have
C
≤
C
C
≤
C
max
;
{
ʴ
n
}
n
≥
1
be a nondecreasing sequence of discount factors converging to
1
.It
3. Let
holds that
n
∈N
C
and
n
∈N
C
ʴ
n
,
h
=
C
h
ʴ
n
,
h
max
=
C
max
.
Proof
We only consider
C
; the case for
C
max
is similar.
1. Similar to the proof of Lemma
4.3
.
2. Straightforward by the definition of
C
.
3. For any
f
∈
S
ₒ
[0, 1] and
s
∈
S
, we show that
Search WWH ::
Custom Search