Collision Avoidance Using Partially Controlled Markov Decision Processes - Agents and Artificial Intelligence

Information Technology Reference

In-Depth Information

Markov Decision Processes

An MDP is defined by a transition function T and cost function C . The probability of

transitioning from state s to state s after executing action a is given by T ( s, a, s ) .The

immediate cost when executing a from s is given by C ( s, a ) . In this paper, the state

space S and action space A are finite [21,3].

A policy is a function π that maps states to actions. The expected sum of immediate

costs when following π for K steps starting from state s is denoted J K ( s ) , often called

the cost-to-go function. The solution to an MDP with a horizon of K is a policy π ∗ K

that minimizes the cost to go from every state.

One way to compute π ∗ K is to first compute J K , the cost-to-go function for the op-

timal policy, using a dynamic programming algorithm known as value iteration. The

function J 0 ( s ) is set to zero for all states s . If the state space includes terminal states

with immediate cost C ( s ) ,then J 0 ( s )= C ( s ) for those terminal states. The function

J k ( s ) is computed from J k− 1 as follows:

C ( s, a )+

T ( s, a, s ) J k− 1 ( s ) .

J k ( s )=min

(1)

The iteration continues until horizon K .

The expected cost to go when executing a from s and then continuing with an optimal

policy for K

−

1 steps is given by

J K ( s, a )= C ( s, a )+

T ( s, a, s ) J K− 1 ( s ) .

(2)

An optimal policy may be obtained directly from J K ( s, a ) :

π ∗ K ( s )=argmin

J K ( s, a ) .

(3)

If the state space contains continuous variables, which is common for collision avoid-

ance problems, the state space can be discretized using a multi-dimensional grid or

simplex scheme [9]. The transition function T ( x ,a, x ) in continuous space can be

translated into a discrete transition function T ( s, a, s ) using a variety of sampling and

interpolation methods [15]. Once the state space, transition function, and cost function

have been discretized, J ∗ ( s, a ) may be computed for each discrete state s and action

a . For a continuous state x and action a , J ∗ ( x ,a ) may be approximated using, for ex-

ample, multilinear interpolation. The best action to execute from continuous state x is

simply argmin a J ∗ ( x ,a ) .

Discretizing the full state space can result in a large number of discrete states, expo-

nential in the number of dimensions, which makes computing J ∗ infeasible for many

problems. This “curse of dimensionality” [1] has led to a variety of different approxi-

mation methods [20].

Partial Control

This paper explores a new solution technique for partially controlled MDPs that is appli-

cable to certain collision avoidance problems. It may be applied to interception-seeking

Agents and Artificial Intelligence

Search WWH ::

Custom Search

Home