Testing Finitary Probabilistic Processes - Semantics of Probabilistic Processes: An Operational Approach

Information Technology Reference

In-Depth Information

Lemma 6.12 Given a finitely branching pLTS, discount ʴ and reward function r ,

there always exists a max-seeking policy.

ʴ , r

Proof

max ( s )

can be calculated for each state s . Then we can define a derivative policy dp in the

following way. For any state s ,if r ( s )

Given a pLTS, discount ʴ and reward function r , the discounted payoff

−ₒ

ʴ , max ( ʔ 1 ) for all s

≥

· P

ʔ 1 , then we

−ₒ

set dp undefined at s . Otherwise, we choose a transition s

ʔ among the finite

ʴ , max ( ʔ )

ʴ , max ( ʔ 1 ) for all other

number of outgoing transitions from s such that

≥ P

−ₒ

transitions s

ʔ .

Given a pLTS, discount ʴ , reward function r , and derivative policy dp , we define

the function F ʴ , dp , r

ʔ 1 , and we set dp ( s )

ₒ R

ₒ

ₒ R

:( S

)

( S

) by letting

⊧

⊨

r ( s )

if dp ( s )

↑

F ʴ , dp , r ( f )( s )

(6.16)

⊩

f ( ʔ ) f dp ( s )

= s ∈ ʔ

where f ( ʔ )

ʔ ( s )

f ( s ).

Lemma 6.13 Given a pLTS, discount ʴ< 1 , reward function r , and derivative

policy dp , the function F ʴ , dp , r

has a unique fixed point.

We first show that the function F ʴ , dp , r

Proof

is a contraction mapping. Let f , g be

any two functions of type S

ₒ R

F ʴ , dp , r ( f )

F ʴ , dp , r ( g )

−

F ʴ , dp , r ( f )( s )

F ʴ , dp , r ( g )( s )

sup

−

∈

}

F ʴ , dp , r ( f )( s )

F ʴ , dp , r ( g )( s )

sup

−

∈

S and dp ( s )

↓}

= ʴ ·

sup

{| f ( ʔ )

− g ( ʔ )

|| s ∈ S and dp ( s )

= ʔ for some ʔ }

f ( s )

g ( s )

s ∈

≤

sup

−

}

·|

−

By the Banach unique fixed-point theorem (Theorem 2.8), the function F ʴ , dp , r has a

unique fixed point.

Lemma 6.14 Given a pLTS, discount ʴ, reward function r , and max-seeking policy

dp , the function

ʴ , max is a fixed point of F ʴ , dp , r .

We need to show that F ʴ , dp , r (

ʴ , max )( s )

= P

ʴ , max ( s ) holds for any state s .We

Proof

distinguish two cases.

1. If dp ( s )

, then F ʴ , dp , r (

↑

ʴ , max )( s )

r ( s )

= P

ʴ , max ( s ) as expected.

2. I f dp ( s )

= ʔ , then the arguments are more involved. First we observe that if

s ⃒ ʴ ʔ , then by Definition 6.14 there exist some ʔ 0

, ʔ 0 , ʔ 1 , ʔ such that

−ₒ

ʔ 0 +

ʔ 0 , ʔ 0

ʔ 1 , ʔ 1 ⃒ ʴ ʔ and ʔ =

ʔ 0 +

ʔ . So we can do

the following calculation.

Semantics of Probabilistic Processes: An Operational Approach

Search WWH ::

Custom Search

Home