Information Technology Reference
In-Depth Information
4.3
Uncontrolled Subproblem
Solving the uncontrolled subproblem involves using the probabilistic model of the un-
controlled dynamics to infer a distribution over τ for each uncontrolled state s u .This
distribution is referred to as the entry time distribution because it represents the distri-
bution over the time for s u to enter G . The probability that s u enters G in τ steps is
denoted D τ ( s u ) and may be computed using dynamic programming. The probability
that τ =0 is given by
D 0 ( s u )= 1
if s u
G ,
(6)
0
otherwise.
The probability that τ = k ,for k> 0 , is computed from D k− 1 as follows:
D k ( s u )= 0
if s u
G ,
s u
(7)
T ( s u ,s u ) D k− 1 ( s u )
otherwise.
Depending on s u , there may be some probability that s u does not enter G within K
steps. This probability is denoted D K ( s u ) and may be computed from D 0 ,...,D K :
K
D K ( s u )=1
D k ( s u ) .
(8)
k =0
The sequence D 0 ,...,D K ,D K is stored in a table with O ( K
) entries. Multilin-
ear interpolation of the distributions may be used to determine D τ ( x u ) at an arbitrary
continuous state x u .
|
S u |
4.4
Online Solution
After J 0 ,...,J K ,J K and D 0 ,...,D K ,D K have been computed offline, they are used
together online to determine the approximately optimal action to execute from the cur-
rent state. For any discrete state s in the original state space, J K ( s, a ) may be computed
as follows:
K
J K ( s, a )= D K ( s u ) J K ( s c ,a )+
D k ( s u ) J k ( s c ,a ) ,
(9)
k =0
where s u is the discrete uncontrolled state and s c is the discrete controlled state asso-
ciated with s . Combining the controlled and uncontrolled solutions online in this way
requires time linear in the size of the horizon. Multilinear interpolation can be used
to estimate J K ( x ,a ) for an arbitrary state x , and from this the optimal action may be
obtained.
The memory required to store J K ( s, a ) is O (
|
A
||
S c ||
S u |
) . However, the method in
this section allows the solution to be represented using O ( K
|
A
||
S c |
+ K
|
S u |
) storage,
which can be a tremendous savings when
are large. For the collision
avoidance problem discussed in the next section, this method allows the cost table to
be stored in 500 MB instead of over 1 TB. The offline computational savings are even
more significant.
|
S c |
and
|
S u |
Search WWH ::




Custom Search