Testing Finitary Probabilistic Processes - Semantics of Probabilistic Processes: An Operational Approach

Information Technology Reference

In-Depth Information

Lemma 6.10

Let dp be a derivative policy in a pLTS. Then

(a) If s ⃒ dp ʔ and s ⃒ dp ʘ then ʔ = ʘ.

(b) For every state s there exists some ʔ such that s

⃒ dp ʔ.

Proof

ʘ as in Def-

inition 6.4 , via the subdistributions ʔ k , ʔ k , ʔ k and ʘ k , ʘ k , ʘ k , respectively.

Because both derivations are guided by the same derivative policy dp it is easy to

show by induction on k that

To prove part (a) consider the derivation of s

⃒

ʔ and s

⃒

ʔ k =

ʘ k

ʔ k

ʘ k

ʔ k =

ʘ k

from which ʔ

ʘ follows immediately.

To prove (b) we use dp to generate subdistributions ʔ k , ʔ k

, ʔ k

for each

≥

0 satisfying the constraints of Definition 6.4 and simultaneously those in

Definition 6.12 above. The result will then follow by letting ʔ be k ≥ 0 ʔ k .

The net effect of this lemma is that a derivative policy dp determines a total

function from states to derivations. L et Der dp : S

ₒ D

( S ) be defined by letting

Der dp ( s ) be the unique ʔ such that s

⃒ dp ʔ .

It should be clear that the use of derivative policies limits considerably the scope

for deriving weak derivations. Each particular policy can only derive one weak

derivative, and moreover in finitary pLTS there are only a finite number of derivative

policies. Nevertheless, we will show that this limitation is more apparent than real. In

Sect. 4.6.1, we saw how the more restrictive extreme policies ep could in fact, realise

the maximum value attainable by any resolution of a finitely branching pLTS. Here,

we generalise this result by replacing resolutions with arbitrary reward functions.

Definition 6.13 (Rewards and Payoffs) Let S be a set of states. A reward function

is a function r : S

ₒ

[

−

1, 1]. With S

s 1 , ... , s n }

, we often consider a reward

function as the n -dimensional vector

r ( s 1 ), ... , r ( s n )

. In this way, we can use the

notion r

ʔ to stand for the inner product of two vectors.

Given such a reward function, we define the payoff function

r max : S

ₒ R

r max ( s )

sup

{

⃒

}

A priori these payoff functions for a given state s are determined by the set of

weak derivatives of s . However, the main result of this section is that they can, in

fact, always be realised by derivative policies.

Theorem 6.8 (Realising Payoffs) In a finitary pLTS, for every reward function r

there exists some derivative policy dp such that

r max ( s )

Der dp ( s ) .

Proof As with Theorem 4.3, there is a temptation to give a constructive proof here,

defining the effect of the required derivative policy dp at state s by considering the

application of the reward function r to both s and all of its derivatives. However, this

is not possible, as the example below explains.

Instead the proof is nonconstructive, requiring discounted policies. The overall

structure of the proof is similar to that of Theorem 4.3, but the use of (discounted)

Semantics of Probabilistic Processes: An Operational Approach

Search WWH ::

Custom Search

Home