Information Technology Reference
In-Depth Information
Now, let
I
be an IMC with state space S , initial state s ,andlet G
S
be a set of goal states and I
R
a time interval with rational bounds. The
I G is defined as:
time-bounded reachability event
ω ( s )
I G =
s
π @ t. s
{
π
∈ Paths
|∃
t
I.
G
}
It thus contains all infinite paths starting in state s that hit a state in G at some
time point that lies in the interval I . We are basically interested in the probability
of the event
I G . The problem, however, is that —due to the presence of non-
determinism— this is not uniquely defined. To see this, consider the IMC of
Fig. 1 with G =
[0 , 2] G for state s 2 ,for
instance, now depends on how the non-deterministic choice between α and β
has been resolved in state s 2 .If β is chosen the probability equals one; otherwise
it depends on the choice in state s 1 . We therefore consider the probability of
{
s 4 }
. The probability of the event
I G relative to a specific resolution of the non-determinism in the IMC. Such
resolution is defined by a total-time deterministic positional policy D ,say.It
goes beyond the scope of this paper to fully define this class of policies. For the
sake of the remainder of this paper, it suces to consider D as a function that
takes as argument the current state s i , say, and the total time that has elapsed
along the path leading to s i , including the time already spent in state s i so
far. Based on this information, D will select one of the actions of an outgoing
transition of s i .
Example 2. Consider again the IMC of Fig. 1. Assume the execution of the IMC
so far is s 0 ⊥, 3 . 0
s 1 ⊥, 2 . 0
s 2 . A choice between the actions α and β has to be
made in s 2 . An example policy D is D ( s 2 ,t )= α if t
−−−−→
−−−−→
10, and D ( s 2 ,t )= β
otherwise. Thus, if the residence time in the current state s 2 is d time units, say,
then α will be chosen if d
5 (as 5 time units have passed until reaching s 2 ),
whereas β will be chosen if d> 5.
We can now be more precise about the measure-of-interest: we are interested in
maximizing the probability of
I G for all possible total-time dependent policies,
i.e., we want to determine
s,D
I G
p max ( s, I )=su D
Pr
for timed policy D.
One may wonder whether we should not consider more powerful classes of poli-
cies, such as randomised ones, or policies that may base their decision on the
entire computation so far, but this does not lead to a larger value for p max ( s, I ):
Theorem 4. [57] Total-time deterministic positional policies are optimal for
maximising Pr(
I G ) .
Reachability probabilities. Before discussing how to compute p max ( s, I ), let us
first discuss a simpler variant of the event
I G . This case occurs if sup I =
and
inf I = 0. As the time interval does not impose any timing constraint anymore,
this amounts to a simple reachability event:
ω ( s )
G =
{
π
∈ Paths
|∃
i
N
[ i ]
G
}
 
Search WWH ::




Custom Search