Collision Avoidance Using Partially Controlled Markov Decision Processes - Agents and Artificial Intelligence

Information Technology Reference

In-Depth Information

4.3

Uncontrolled Subproblem

Solving the uncontrolled subproblem involves using the probabilistic model of the un-

controlled dynamics to infer a distribution over τ for each uncontrolled state s u .This

distribution is referred to as the entry time distribution because it represents the distri-

bution over the time for s u to enter G . The probability that s u enters G in τ steps is

denoted D τ ( s u ) and may be computed using dynamic programming. The probability

that τ =0 is given by

D 0 ( s u )= 1

if s u ∈

G ,

(6)

otherwise.

The probability that τ = k ,for k> 0 , is computed from D k− 1 as follows:

D k ( s u )= 0

if s u ∈

G ,

s u

(7)

T ( s u ,s u ) D k− 1 ( s u )

otherwise.

Depending on s u , there may be some probability that s u does not enter G within K

steps. This probability is denoted D K ( s u ) and may be computed from D 0 ,...,D K :

D K ( s u )=1

−

D k ( s u ) .

(8)

k =0

The sequence D 0 ,...,D K ,D K is stored in a table with O ( K

) entries. Multilin-

ear interpolation of the distributions may be used to determine D τ ( x u ) at an arbitrary

continuous state x u .

S u |

4.4

Online Solution

After J 0 ,...,J K ,J K and D 0 ,...,D K ,D K have been computed offline, they are used

together online to determine the approximately optimal action to execute from the cur-

rent state. For any discrete state s in the original state space, J K ( s, a ) may be computed

as follows:

J K ( s, a )= D K ( s u ) J K ( s c ,a )+

D k ( s u ) J k ( s c ,a ) ,

(9)

k =0

where s u is the discrete uncontrolled state and s c is the discrete controlled state asso-

ciated with s . Combining the controlled and uncontrolled solutions online in this way

requires time linear in the size of the horizon. Multilinear interpolation can be used

to estimate J K ( x ,a ) for an arbitrary state x , and from this the optimal action may be

obtained.

The memory required to store J K ( s, a ) is O (

S c ||

S u |

) . However, the method in

this section allows the solution to be represented using O ( K

S c |

+ K

S u |

) storage,

which can be a tremendous savings when

are large. For the collision

avoidance problem discussed in the next section, this method allows the cost table to

be stored in 500 MB instead of over 1 TB. The offline computational savings are even

more significant.

S c |

and

S u |

Agents and Artificial Intelligence

Search WWH ::

Custom Search

Home