Collision Avoidance Using Partially Controlled Markov Decision Processes - Agents and Artificial Intelligence

Information Technology Reference

In-Depth Information

or goal-oriented problems as well by incorporating negative costs. So long as the prob-

lem satisfies a set of assumptions, this solution method will provide a finite-horizon

solution. The approach involves independently solving a controlled subproblem and an

uncontrolled subproblem and combining the results online to identify the approximately

optimal action.

4.1

Assumptions

It is assumed that the state is represented by a set of variables, some controlled and some

uncontrolled. The state space of the controlled variables is denoted S c , and the state

space of the uncontrolled variables is denoted S u . The state of the controlled variables

at time t is denoted s c ( t ) , and the state of the uncontrolled variables at time t is denoted

s u ( t ) . The solution technique may be applied when the following three assumptions

hold:

1. The state s u ( t +1) depends only upon s u ( t ) . The probability of transitioning from

s u to s u is given by T ( s u ,s u ) .

2. The episode terminates when s u ∈

S u with immediate cost C ( s c ) .

3. In nonterminal states, the immediate cost c ( t +1) depends only upon s c ( t ) and

a ( t ) . If the controlled state is s c and action a is executed, the immediate cost is

denoted C ( s c ,a ) .

⊂

4.2

Controlled Subproblem

Solving the controlled subproblem involves computing the optimal policy for the con-

trolled variables under the assumption that the time until s u enters G , denoted τ ,is

known. In an airborne collision avoidance context, τ may be the number of steps until

another aircraft comes within 500 ft horizontally. Of course, τ cannot be determined

exactly from s u ( t ) because it depends upon an event that occurs in the future, but this

will be addressed by the uncontrolled subproblem (Section 4.3).

The cost to go from s c given τ is denoted J τ ( s c ) .Theseries J 0 ,...,J K is computed

recursively, starting with J 0 ( s c )= C ( s c ) and iterating as follows:

⎡

⎣ C ( s c ,a )+

s c

⎤

⎦ .

T ( s c ,a,s c ) J k− 1 ( s c )

J k ( s c )=min

(4)

The expected cost to go from s c when executing a for one step and then following the

optimal policy is given by

J k ( s c ,a )= C ( s c ,a )+

s c

T ( s c ,a,s c ) J k− 1 ( s c ) .

(5)

The K -step expected cost to go when τ>K is denoted J K . It is computed by ini-

tializing J 0 ( s c )=0 for all states and iterating equation 4 to horizon K .Theseries

J 0 ,...,J K ,J K is saved in a table in memory, requiring O ( K

S c |

) entries.

Agents and Artificial Intelligence

Search WWH ::

Custom Search

Home