Information Technology Reference
In-Depth Information
4.3
Uncontrolled Subproblem
Solving the uncontrolled subproblem involves using the probabilistic model of the un-
controlled dynamics to infer a distribution over
τ
for each uncontrolled state
s
u
.This
distribution is referred to as the entry time distribution because it represents the distri-
bution over the time for
s
u
to enter
G
. The probability that
s
u
enters
G
in
τ
steps is
denoted
D
τ
(
s
u
)
and may be computed using dynamic programming. The probability
that
τ
=0
is given by
D
0
(
s
u
)=
1
if
s
u
∈
G
,
(6)
0
otherwise.
The probability that
τ
=
k
,for
k>
0
, is computed from
D
k−
1
as follows:
D
k
(
s
u
)=
0
if
s
u
∈
G
,
s
u
(7)
T
(
s
u
,s
u
)
D
k−
1
(
s
u
)
otherwise.
Depending on
s
u
, there may be some probability that
s
u
does not enter
G
within
K
steps. This probability is denoted
D
K
(
s
u
)
and may be computed from
D
0
,...,D
K
:
K
D
K
(
s
u
)=1
−
D
k
(
s
u
)
.
(8)
k
=0
The sequence
D
0
,...,D
K
,D
K
is stored in a table with
O
(
K
)
entries. Multilin-
ear interpolation of the distributions may be used to determine
D
τ
(
x
u
)
at an arbitrary
continuous state
x
u
.
|
S
u
|
4.4
Online Solution
After
J
0
,...,J
K
,J
K
and
D
0
,...,D
K
,D
K
have been computed offline, they are used
together online to determine the approximately optimal action to execute from the cur-
rent state. For any discrete state
s
in the original state space,
J
K
(
s, a
)
may be computed
as follows:
K
J
K
(
s, a
)=
D
K
(
s
u
)
J
K
(
s
c
,a
)+
D
k
(
s
u
)
J
k
(
s
c
,a
)
,
(9)
k
=0
where
s
u
is the discrete uncontrolled state and
s
c
is the discrete controlled state asso-
ciated with
s
. Combining the controlled and uncontrolled solutions online in this way
requires time linear in the size of the horizon. Multilinear interpolation can be used
to estimate
J
K
(
x
,a
)
for an arbitrary state
x
, and from this the optimal action may be
obtained.
The memory required to store
J
K
(
s, a
)
is
O
(
|
A
||
S
c
||
S
u
|
)
. However, the method in
this section allows the solution to be represented using
O
(
K
|
A
||
S
c
|
+
K
|
S
u
|
)
storage,
which can be a tremendous savings when
are large. For the collision
avoidance problem discussed in the next section, this method allows the cost table to
be stored in 500 MB instead of over 1 TB. The offline computational savings are even
more significant.
|
S
c
|
and
|
S
u
|