Information Technology Reference
In-Depth Information
˄
−ₒ s k + 1
s k
0 2 k
ˉ. 0 .
˄
−ₒ
s k
max ( s k )tobe h ( ˉ ) for every k , and a max-seeking extreme
policy is determined by ep ( s k )
We have calculated
V
s k + 1 ; indeed this is essentially the only such
policy. However the resolution associated with this policy does not realise
=
max ,as
V
ep ( s k )
V
0.
Nevertheless, we will show that if we restrict attention to finitary pLTSs, then
there will always exist some static resolution that realises
=
max . The proof relies on
techniques used in Markov process theory [ 9 ], and unlike that of Proposition 4.5 is
nonconstructive; we simply prove that some such resolution exists, without actually
showing how to construct it. Although such techniques are relatively standard in
the theory of Markov decision processes, see [ 9 ] for example, they are virtually
unknown in concurrency theory. So we give a detailed exposition that cumulates in
Theorem 4.3 .
Consider a set of all functions from a finite set R to [0, 1], denoted by [0, 1] R , and
the distance function d over [0, 1] R
V
defined by d ( f , g )
= max | f ( r )
g ( r )
| r R .We
can check that ([0, 1] R , d ) constitutes a complete metric space. Let ʴ
(0, 1) be a
h
discount factor. The discounted version of the function
C
in ( 4.10 ),
ʴ , h
C
:( R
[0, 1])
( R
[0, 1]) defined by
ˉ
−ₒ
h ( ˉ )
if r
ʴ , h ( f )( r )
ˉ
˄
C
=
(4.18)
0
if r
−ₒ
and r
−ₒ
˄
−ₒ
ˉ
·
−ₒ
ʴ
Exp ʔ ( f ) f r
and r
ʔ
is a contraction mapping with constant ʴ . It follows from the Banach fixed-point
theorem (cf. Theorem 2.8) that
ʴ , h
C
has a unique fixed point when ʴ< 1, which we
ʴ , h . On the other hand, it can be shown that
ʴ , h is a continuous function
denote by
V
C
over the complete lattice [0, 1] R .So
ʴ , h , as the least fixed point of
ʴ , h , has the
V
C
= n ∈N V
ʴ , h
ʴ , h , n , where
ʴ , h , n is the n th iteration of
ʴ , h over
characterisation
V
V
C
.
ʴ , h
ʴ , h
Note that if there is no discount, i.e. ʴ
=
1, we see that
C
and
V
coincide with
ʴ , h
min and
h and
h , respectively. Similarly, we can define
ʴ , max .
C
V
V
V
ʴ , h
ʴ , max have the following properties.
The functions
C
and
V
[0, 1] ʩ
Lemma 4.6
Let h
be any reward vector.
ʴ , h
ʴ , h
1. For any ʴ
(0, 1] , the functions
C
and
C
max are continuous;
ʴ 1 , h
ʴ 2 , h and
ʴ 1 , h
max
ʴ 2 , h
2. If ʴ 1 , ʴ 2
(0, 1] and ʴ 1
ʴ 2 , then we have
C
C
C
C
max ;
{
ʴ n } n 1 be a nondecreasing sequence of discount factors converging to 1 .It
3. Let
holds that n ∈N C
and n ∈N C
ʴ n , h
= C
h
ʴ n , h
max
= C
max .
Proof
We only consider
C
; the case for
C max is similar.
1. Similar to the proof of Lemma 4.3 .
2. Straightforward by the definition of
C
.
3. For any f S
[0, 1] and s S , we show that
 
Search WWH ::




Custom Search