Information Technology Reference
In-Depth Information
6.5.2
Realising Payoffs
In this section, we expose some less obvious properties of derivations, relating to
their behaviour at infinity. One important property is that if we associate each state
with a reward, which is a value in [
1, 1], then the maximum payoff realisable
by following all possible weak derivations can in fact be achieved by some static
derivative policy, as stated in Theorem
6.8
. The property depends on our working
within
finitary
pLTSs, that is, ones in which the state space is finite and the (unlifted)
transition relation is finite-branching. We first need to formalise some concepts such
as discounted weak derivation and discounted payoff.
−
Definition 6.14 (Discounted Weak Derivation)
The
discounted weak derivation
ʔ
⃒
ʴ
ʔ
for discount factor
ʴ
(0
1) is obtained from a weak derivation
by discounting each
˄
transition by
ʴ
. That is, there is a collection of
ʔ
k
≤
ʴ
≤
and
ʔ
k
satisfying
ʔ
0
ʔ
0
ʔ
=
+
˄
−ₒ
ʔ
1
ʔ
0
+
ʔ
1
.
˄
−ₒ
ʔ
k
+
1
ʔ
k
ʔ
k
+
1
+
.
such that
ʔ
=
k
=
0
ʴ
k
ʔ
k
.
It is trivial that the relation
⃒
1
coincides with
⃒
given in Definition
6.4
.
Definition 6.15 (Discounted Payoff)
Given a pLTS with state space
S
, a discount
ʴ
, and reward function
r
, we define the
discounted payoff function
ʴ
,
max
:
S
P
ₒ R
by
ʴ
,
max
(
s
)
ʔ
|
⃒
ʴ
ʔ
}
P
=
sup
{
r
·
s
and we will generalise it to be of type
D
sub
(
S
)
ₒ R
by letting
ʴ
,
max
(
ʔ
)
ʴ
,
max
(
s
)
.
P
=
ʔ
(
s
)
· P
s
∈
ʔ
Definition 6.16 (Max-seeking Policy)
Given a pLTS, discount
ʴ
and reward func-
tion
r
, we say a static derivative policy
dp
is
max-seeking
with respect to
ʴ
and
r
if
for all
s
the following requirements are met.
˄
−ₒ
ʴ
,
r
1. If
dp
(
s
)
↑
, then
r
(
s
)
≥
ʴ
· P
max
(
ʔ
1
) for all
s
ʔ
1
.
2. If
dp
(
s
)
=
ʔ
then
ʴ
,
r
a)
ʴ
· P
max
(
ʔ
)
≥
r
(
s
) and
˄
−ₒ
ʴ
,
r
ʴ
,
r
b)
P
max
(
ʔ
)
≥ P
max
(
ʔ
1
) for all
s
ʔ
1
.
Search WWH ::
Custom Search