Collision Avoidance Using Partially Controlled Markov Decision Processes - Agents and Artificial Intelligence

Information Technology Reference

In-Depth Information

Collision Avoidance Using Partially Controlled

Markov Decision Processes

Mykel J. Kochenderfer and James P. Chryssanthacopoulos

Lincoln Laboratory, Massachusetts Institute of Technology

244 Wood Street, MA 02420, Lexington, U.S.A

{ mykelk,chryssanthacopoulos } @ll.mit.edu

Abstract. Optimal collision avoidance in stochastic environments requires ac-

counting for the likelihood and costs of future sequences of outcomes in response

to different sequences of actions. Prior work has investigated formulating the

problem as a Markov decision process, discretizing the state space, and solving

for the optimal strategy using dynamic programming. Experiments have shown

that such an approach can be very effective, but scaling to higher-dimensional

problems can be challenging due to the exponential growth of the discrete state

space. This paper presents an approach that can greatly reduce the complexity

of computing the optimal strategy in problems where only some of the dimen-

sions of the problem are controllable. The approach is applied to aircraft collision

avoidance where the system must recommend maneuvers to an imperfect pilot.

Keywords: Markov

decision

processes,

Dynamic

programming,

Collision

avoidance.

1

Introduction

Manually constructing a robust collision avoidance system, whether it be for an au-

tonomous or human-controlled vehicle, is challenging because the future effects of the

system cannot be known exactly. Due to their safety-critical nature, collision avoidance

systems must maintain a high degree of reliability while minimizing unnecessary path

deviation. Recent work has investigated formulating the problem of collision avoid-

ance as a Markov decision process (MDP) and solving for the optimal strategy using

dynamic programming (DP) [14,15,25]. One limitation of this approach is that the com-

putation and memory requirements grow exponentially with the dimensionality of the

state space. Hence, these studies focused on MDP formulations that capture only a sub-

set of the relevant state variables at the expense of impaired performance.

This paper presents a new approach for significantly reducing the computation and

memory requirements for partially controlled Markov decision processes. The approach

involves decomposing the problem into two separate subproblems, one controlled and

one uncontrolled, that can be solved independently offline using dynamic programming.

This work is sponsored by the Federal Aviation Administration under Air Force Contract

#FA8721-05-C-0002. Opinions, interpretations, conclusions, and recommendations are those

of the authors and are not necessarily endorsed by the United States Government.

Agents and Artificial Intelligence

Search WWH ::

Custom Search

Home