Information Technology Reference
In-Depth Information
Intuitively an extreme policy ep determines a computation through the pLTS. But
this set of possible computations, unlike resolutions as defined in Definition 4.3 ,
is very restrictive. Policy ep decides at each state, once and for all, which of the
available ˄ choices to take; it does not interpolate, and since it is a function of the
state, it makes the same choice on every visit. But there are two constraints:
(i)
Condition (a) ensures an in-built preference for reporting success; if the state
is successful, the policy must also report success;
(ii)
Condition (b), together with (a), means that ep ( s ) is defined whenever s −ₒ .
This ensures that the policy cannot decide to stop at a state s if there is a
possibility of proceeding from s ; the computation must proceed, if it is possible
to proceed.
An extreme policy ep determines a deterministic pLTS
S , ʩ ˄ ,
ep
, where
ep
is determined by s
ep ep ( s ). Moreover for any subdistribution ʔ over S , it de-
termines the obvious resolution
, with the identity as the associated
resolving function. Indeed it is possible to show that every static resolution is
determined in this manner by some extreme policy.
S , ʔ ,
ep
Proposition 4.5
Let ʔ be a subdistribution in a finitely branching pLTS
S , ʩ ˄ ,
.
R
Then there exists a static resolution of ʔ, say
R , ʘ ,
such that
Exp ʘ V
h =
Exp ʔ V
min
h
for any reward vector h.
Proof We exhibit the required resolution by defining an extreme policy over S ;in
other words, the resolution will take the form
S , ʘ ,
ep
for some extreme policy
ep (
).
We say the extreme policy ep (
)is min-seeking if its domain is
{
s
S
|
s
−ₒ}
and it satisfies:
˄
−ₒ
˄
−ₒ
ˉ
min ( ep ( s ))
min ( ʔ ) whenever s
if s
−ₒ
but s
then
V
≤ V
ʔ.
Note that by design, a min-seeking policy satisfies:
˄
−ₒ
ˉ
h
min ( s )
h
min ( ep ( s )) .
if s
−ₒ
but s
then
V
= V
(4.17)
In a finitely branching pLTS, it is straightforward to define a min-seeking extreme
policy:
ˉ
−ₒ
ˉ
−ₒ
(i)
If s
, then let ep ( s )beany ʔ such that s
ʔ .
˄
−ₒ
˄
−ₒ
(ii)
Otherwise, if s
, let
{
ʔ 1 , ...ʔ n }
be the finite nonempty set
{
ʔ
|
s
ʔ
}
.
min ( ʔ k )
min ( ʔ j ) for every
Now let ep ( s )beany ʔ k satisfying the property
V
≤ V
n ; at least one such ʔ k must exist.
We now show that the static resolution determined by such a policy,
1
j
ep
satisfies the requirements of the proposition. For the sake of clarity, let us write
V
S , ʘ ,
ep ( ʔ ) for the value realised for ʔ in this resolution.
Search WWH ::




Custom Search