Testing Finitary Probabilistic Processes - Semantics of Probabilistic Processes: An Operational Approach

Information Technology Reference

In-Depth Information

6.9

Real-Reward Testing

In Sect. 4.4, we introduced a notion of reward testing inspired by [ 12 ]. The idea is to

associate with each success action a nonnegative reward, and performing a success

action means accumulating some reward. The outcomes of this reward testing are

nonnegative expected rewards.

In certain occasions, it is very natural to introduce negative rewards. For example,

this is the case in the theory of Markov decision processes [ 6 ]. Intuitively, we could

understand negative rewards as costs, while positive rewards are often viewed as

benefits or profits. Consider, for instance, the (nonprobabilistic) processes Q 1 and

Q 2 with initial states q 1 and q 2 , respectively, in Fig. 6.5 . Here a represents the action

of making an investment. Assuming that the investment is made by bidding for some

commodity, the ˄ -action represents an unsuccessful bid—if this happens one simply

tries again. Now b represents the action of reaping the benefits of this investment.

Whereas Q 1 models a process in which making the investment is always followed by

an opportunity to reap the benefits, the process Q 2 allows, nondeterministically, for

the possibility that the investment is unsuccessful, so that a does not always lead to a

state where b is enabled. The test T with initial state t , which will be explained later,

allows us to give a negative reward to action ˉ 1 —its cost—and a positive reward to

ˉ 2 .

This leads to the question: if both negative- and positive rewards are allowed, how

would the original reward-testing semantics change? 3 We refer to the more relaxed

form of testing as real-reward testing and the original one as nonnegative-reward

testing .

The power of real-reward testing is illustrated in Fig. 6.5 . The two (nonprobabilis-

tic) processes in the left- and central diagrams are equivalent under (probabilistic)

may- as well as must testing; the ˄ -loops in the initial states cause both processes

to fail any nontrivial must test. Yet, if a reward of

2 is associated with performing

the action ˉ 1 , and a reward of 4 with the subsequent performance of ˉ 2 , it turns out

that in the first process the net reward is either 0, if the process remains stuck in its

initial state, or positive, whereas running the second process may yield a loss. See

Example 6.27 for details of how these rewards are assigned, and how net rewards are

associated with the application of tests such as T . This example shows that for pro-

cesses that may exhibit divergence, real-reward testing is more discriminating than

nonnegative-reward testing, or other forms of probabilistic testing. It also illustrates

that the extra power is relevant in applications.

−

3

1]

can be converted into a nonnegative assignment simply by adding 1 to all of them. But that would

not preserve the testing order in the case of zero-outcomes that resulted from a process' s failing to

reach any success state at all: those zeroes would remain zero.

One might suspect no change at all, for any assignment of rewards from the interval [

−

1,

+

Semantics of Probabilistic Processes: An Operational Approach

Search WWH ::

Custom Search

Home