Information Technology Reference
In-Depth Information
Intuitively an extreme policy
ep
determines a computation through the pLTS. But
this set of possible computations, unlike resolutions as defined in Definition
4.3
,
is very restrictive. Policy
ep
decides at each state, once and for all, which of the
available
˄
choices to take; it does not interpolate, and since it is a function of the
state, it makes the same choice on every visit. But there are two constraints:
(i)
Condition (a) ensures an in-built preference for reporting success; if the state
is successful, the policy must also report success;
(ii)
Condition (b), together with (a), means that
ep
(
s
) is defined whenever
s
−ₒ
.
This ensures that the policy cannot decide to stop at a state
s
if there is a
possibility of proceeding from
s
; the computation must proceed, if it is possible
to proceed.
An extreme policy
ep
determines a deterministic pLTS
S
,
ʩ
˄
,
ₒ
ep
, where
ₒ
ep
is determined by
s
ₒ
ep
ep
(
s
). Moreover for any subdistribution
ʔ
over
S
, it de-
termines the obvious resolution
, with the identity as the associated
resolving function. Indeed it is possible to show that every static resolution is
determined in this manner by some extreme policy.
S
,
ʔ
,
ₒ
ep
Proposition 4.5
Let ʔ be a subdistribution in a finitely branching pLTS
S
,
ʩ
˄
,
ₒ
.
ₒ
R
Then there exists a static resolution of ʔ, say
R
,
ʘ
,
such that
Exp
ʘ
V
h
=
Exp
ʔ
V
min
h
for any reward vector h.
Proof
We exhibit the required resolution by defining an extreme policy over
S
;in
other words, the resolution will take the form
S
,
ʘ
,
ₒ
ep
for some extreme policy
ep
(
).
We say the extreme policy
ep
(
−
−
)is
min-seeking
if its domain is
{
s
∈
S
|
s
−ₒ}
and it satisfies:
˄
−ₒ
˄
−ₒ
ˉ
min
(
ep
(
s
))
min
(
ʔ
) whenever
s
if
s
−ₒ
but
s
then
V
≤ V
ʔ.
Note that by design, a min-seeking policy satisfies:
˄
−ₒ
ˉ
h
min
(
s
)
h
min
(
ep
(
s
))
.
if
s
−ₒ
but
s
then
V
= V
(4.17)
In a finitely branching pLTS, it is straightforward to define a min-seeking extreme
policy:
ˉ
−ₒ
ˉ
−ₒ
(i)
If
s
, then let
ep
(
s
)beany
ʔ
such that
s
ʔ
.
˄
−ₒ
˄
−ₒ
(ii)
Otherwise, if
s
, let
{
ʔ
1
,
...ʔ
n
}
be the finite nonempty set
{
ʔ
|
s
ʔ
}
.
min
(
ʔ
k
)
min
(
ʔ
j
) for every
Now let
ep
(
s
)beany
ʔ
k
satisfying the property
V
≤ V
≤
≤
n
; at least one such
ʔ
k
must exist.
We now show that the static resolution determined by such a policy,
1
j
ₒ
ep
satisfies the requirements of the proposition. For the sake of clarity, let us write
V
S
,
ʘ
,
ep
(
ʔ
) for the value realised for
ʔ
in this resolution.
Search WWH ::
Custom Search