Information Technology Reference
In-Depth Information
Now, let
I
be an IMC with state space
S
, initial state
s
,andlet
G
⊆
S
be a set of goal states and
I
⊆
R
a time interval with rational bounds. The
I
G
is defined as:
time-bounded reachability event
ω
(
s
)
I
G
=
s
∈
π
@
t. s
∈
{
π
∈ Paths
|∃
t
∈
I.
∃
G
}
It thus contains all infinite paths starting in state
s
that hit a state in
G
at some
time point that lies in the interval
I
. We are basically interested in the probability
of the event
I
G
. The problem, however, is that —due to the presence of non-
determinism— this is not uniquely defined. To see this, consider the IMC of
Fig. 1 with
G
=
[0
,
2]
G
for state
s
2
,for
instance, now depends on how the non-deterministic choice between
α
and
β
has been resolved in state
s
2
.If
β
is chosen the probability equals one; otherwise
it depends on the choice in state
s
1
. We therefore consider the probability of
{
s
4
}
. The probability of the event
I
G
relative to a specific resolution of the non-determinism in the IMC. Such
resolution is defined by a total-time deterministic positional
policy
D
,say.It
goes beyond the scope of this paper to fully define this class of policies. For the
sake of the remainder of this paper, it suces to consider
D
as a function that
takes as argument the current state
s
i
, say, and the total time that has elapsed
along the path leading to
s
i
, including the time already spent in state
s
i
so
far. Based on this information,
D
will select one of the actions of an outgoing
transition of
s
i
.
Example 2.
Consider again the IMC of Fig. 1. Assume the execution of the IMC
so far is
s
0
⊥,
3
.
0
s
1
⊥,
2
.
0
s
2
. A choice between the actions
α
and
β
has to be
made in
s
2
. An example policy
D
is
D
(
s
2
,t
)=
α
if
t
−−−−→
−−−−→
10, and
D
(
s
2
,t
)=
β
otherwise. Thus, if the residence time in the current state
s
2
is
d
time units, say,
then
α
will be chosen if
d
≤
5 (as 5 time units have passed until reaching
s
2
),
whereas
β
will be chosen if
d>
5.
We can now be more precise about the measure-of-interest: we are interested in
maximizing the probability of
≤
I
G
for all possible total-time dependent policies,
i.e., we want to determine
s,D
I
G
p
max
(
s, I
)=su
D
Pr
for timed policy
D.
One may wonder whether we should not consider more powerful classes of poli-
cies, such as randomised ones, or policies that may base their decision on the
entire computation so far, but this does not lead to a larger value for
p
max
(
s, I
):
Theorem 4.
[57] Total-time deterministic positional policies are optimal for
maximising
Pr(
I
G
)
.
Reachability probabilities.
Before discussing how to compute
p
max
(
s, I
), let us
first discuss a simpler variant of the event
I
G
. This case occurs if sup
I
=
and
inf
I
= 0. As the time interval does not impose any timing constraint anymore,
this amounts to a simple reachability event:
∞
ω
(
s
)
G
=
{
π
∈ Paths
|∃
i
∈
N
.π
[
i
]
∈
G
}