Information Technology Reference
In-Depth Information
ʴ
,
r
P
max
(
s
)
ʔ
|
⃒
ʴ
ʔ
}
=
sup
{
r
·
s
˄
−ₒ
(
ʔ
0
+
ʔ
0
,
ʔ
0
ʔ
)
ʔ
0
ʔ
1
, and
ʔ
1
⃒
ʴ
ʔ
for some
ʔ
0
=
sup
{
r
·
ʴ
·
|
s
=
+
,
ʔ
0
,
ʔ
1
,
ʔ
}
˄
−ₒ
ʔ
0
+
ʔ
0
,
ʔ
0
ʔ
|
ʔ
0
ʔ
1
, and
ʔ
1
⃒
ʴ
ʔ
for some
ʔ
0
,
ʔ
0
,
ʔ
1
,
ʔ
}
=
sup
{
r
·
ʴ
·
r
·
s
=
+
˄
−ₒ
ʔ
0
+
ʔ
0
and
ʔ
0
=
{
·
·
{
·
ʔ
|
ʔ
1
⃒
ʴ
ʔ
}|
=
ʔ
0
+
sup
r
ʴ
sup
r
s
ʔ
1
for some
ʔ
0
,
ʔ
0
,
ʔ
1
}
˄
−ₒ
ʔ
0
+
ʔ
0
ʴ
,
max
(
ʔ
1
)
ʔ
0
and
ʔ
0
=
sup
{
r
·
ʴ
· P
|
s
=
+
ʔ
1
for some
ʔ
0
,
ʔ
0
,
ʔ
1
}
˄
−ₒ
=
{
−
·
+
· P
ʴ
,
max
(
ʔ
1
)
|
∈
ʔ
1
for some
ʔ
1
}
sup
(1
p
)
r
(
s
)
pʴ
p
[0, 1] and
s
[
s
can be split into
ps
+
(1
−
p
)
s
only]
˄
−ₒ
ʴ
,
max
(
ʔ
1
)
=
sup
{
(1
−
p
)
·
r
(
s
)
+
pʴ
· P
|
p
∈
[0, 1] and
s
ʔ
1
for some
ʔ
1
}
˄
−ₒ
=
sup
{
(1
−
p
)
·
r
(
s
)
+
pʴ
·
sup
{P
ʴ
,
max
(
ʔ
1
)
|
s
ʔ
1
}|
p
∈
[0, 1]
}
˄
−ₒ
ʴ
,
r
=
max
(
r
(
s
),
ʴ
·
sup
{P
max
(
ʔ
1
)
|
s
ʔ
1
}
)
ʴ
,
max
(
ʔ
)
=
ʴ
· P
[as
dp
is max-seeking]
F
ʴ
,
dp
,
r
(
ʴ
,
max
)(
s
)
=
P
Definition 6.17
Let
ʔ
be a subdistribution and
d
p a static derivative policy. We
define a collection of subdistributions
ʔ
k
as follows.
ʔ
0
=
ʔ
ʔ
k
+
1
=
{
ʔ
k
(
s
)
·
dp
(
s
)
|
s
∈
ʔ
k
and
dp
(
s
)
↓}
for all
k
≥
0
.
Then
ʔ
k
is obtained from
ʔ
k
by letting
⊧
⊨
0
if
dp
(
s
)
↓
ʔ
k
(
s
)
=
⊩
ʔ
k
(
s
)
otherwise
⃒
ʴ
,
dp
ʔ
for the discounted weak derivation that
for all
k
≥
0. Then we write
ʔ
determines a unique subdistribution
ʔ
with
ʔ
=
k
=
0
ʴ
k
ʔ
k
.
In other words, if
ʔ
⃒
ʴ
,
dp
ʔ
then
ʔ
comes from the discounted weak derivation
⃒
ʴ
ʔ
that is constructed by following the derivative policy
d
p when choosing
˄
transitions from each state. In the special case when the discount factor
ʴ
ʔ
=
1, we
see that
⃒
1,
dp
becomes
⃒
dp
as defined in page 176.
Search WWH ::
Custom Search