Information Technology Reference
In-Depth Information
6.5.2
Realising Payoffs
In this section, we expose some less obvious properties of derivations, relating to
their behaviour at infinity. One important property is that if we associate each state
with a reward, which is a value in [
1, 1], then the maximum payoff realisable
by following all possible weak derivations can in fact be achieved by some static
derivative policy, as stated in Theorem 6.8 . The property depends on our working
within finitary pLTSs, that is, ones in which the state space is finite and the (unlifted)
transition relation is finite-branching. We first need to formalise some concepts such
as discounted weak derivation and discounted payoff.
Definition 6.14 (Discounted Weak Derivation) The discounted weak derivation
ʔ
ʴ ʔ for discount factor ʴ (0
1) is obtained from a weak derivation
by discounting each ˄ transition by ʴ . That is, there is a collection of ʔ k
ʴ
and ʔ k
satisfying
ʔ 0
ʔ 0
ʔ
=
+
˄
−ₒ ʔ 1
ʔ 0
+ ʔ 1
.
˄
−ₒ
ʔ k + 1
ʔ k
ʔ k + 1 +
.
such that ʔ = k = 0 ʴ k ʔ k .
It is trivial that the relation
1 coincides with
given in Definition 6.4 .
Definition 6.15 (Discounted Payoff) Given a pLTS with state space S , a discount
ʴ , and reward function r , we define the discounted payoff function
ʴ , max : S
P
ₒ R
by
ʴ , max ( s )
ʔ |
ʴ ʔ }
P
=
sup
{
r
·
s
and we will generalise it to be of type
D sub ( S )
ₒ R
by letting
ʴ , max ( ʔ )
ʴ , max ( s ) .
P
=
ʔ ( s )
· P
s ʔ
Definition 6.16 (Max-seeking Policy) Given a pLTS, discount ʴ and reward func-
tion r , we say a static derivative policy dp is max-seeking with respect to ʴ and r if
for all s the following requirements are met.
˄
−ₒ
ʴ , r
1. If dp ( s )
, then r ( s )
ʴ
· P
max ( ʔ 1 ) for all s
ʔ 1 .
2. If dp ( s )
=
ʔ then
ʴ , r
a) ʴ
· P
max ( ʔ )
r ( s ) and
˄
−ₒ
ʴ , r
ʴ , r
b)
P
max ( ʔ )
≥ P
max ( ʔ 1 ) for all s
ʔ 1 .
Search WWH ::




Custom Search