Testing Finitary Probabilistic Processes - Semantics of Probabilistic Processes: An Operational Approach

Information Technology Reference

In-Depth Information

1/2

s 1

Fig. 6.3 Max-seeking policies

derivative policies rather than extreme policies makes the details considerably differ-

ent. Consequently, the proof is spelled out in some detail in Sect. 6.5.2 , cumulating

in Theorem 6.10 .

Example 6.10 Let us say that a derivative policy dp is max-seeking with respect to

a reward function r if for all s

∈

S the following requirements are met.

−ₒ ʔ 1 .

1. If dp ( s )

↑

then r ( s )

≥ P

max ( ʔ 1 ) for all s

2. If dp ( s )

= ʔ then

max ( ʔ )

≥

r ( s ) and

−ₒ

ʔ 1 .

What a max-seeking policy does is to evaluate

max ( ʔ )

≥ P

max ( ʔ 1 ) for all s

max in advance, for a given reward

function r , and then label each state s with the payoff value

r max ( s ). The policy

at any state s is then to compare r ( s ) with the expected label values

r max ( ʔ ) (i.e.

−ₒ

ʔ and then to select the greatest

among all those values. Note that for the policy to be well defined, we require that

the pLTS under consideration is finitely branching.

In case that seems obvious, we now consider the pLTS in Fig. 6.3 and let us

apply the above definition of max-seeking policies to the reward function given by

r ( s 0 )

r max )) for each outgoing transition s

Exp ʔ (

r ( s 1 )

1. For both states a payoff of 1 i s attainable ev entually, thus

r max ( s 0 )

r max ( s 1 )

= P

1, because we have s 0 ⃒

s 1 and s 1 ⃒

s 1 . Hence, both

r max -labelled with 1. At state s 0 the policy then makes a choice among

three options: (1) to stay unmoved, yielding immediate payoff r ( s 0 )

states will be

0; (2) to take

−ₒ

the transition s 0

s 1 . Clearly one of

the latter two is chosen — but which? If it is the second, then indeed the maximum

payoff 1 can be achieved. If it is the first, then in fact the overall payoff will be 0

because of divergence, so the policy would fail to attain the maximum payoff 1.

However, for properly discounted max-seeking policies, we show in Proposi-

tion 6.6 that they always attain the maximum payoffs.

With Theorem 6.8 at hand, we are in the position to prove the main result of this

section, which says that in a finitary pLTS the set of weak derivatives from any state

s 0 ; (3) to take the transition s 0

s 01 / 2 ↕

Semantics of Probabilistic Processes: An Operational Approach

Search WWH ::

Custom Search

Home