Collision Avoidance Using Partially Controlled Markov Decision Processes - Agents and Artificial Intelligence

Information Technology Reference

In-Depth Information

5.2

Controlled Subproblem

The controlled subproblem, formulated as an MDP, is defined by the available actions,

the dynamics, and the cost function. The dynamics are determined by the pilot response

model and aircraft dynamic model. The cost function takes into account both safety and

operational considerations. In addition to describing these components of the MDP, this

section discusses the resulting optimal policy.

Resolution Advisories. The airborne collision avoidance system may choose to issue

one of two different initial advisories: climb at least 1500 ft/min or descend at least

1500 ft/min. Following the initial advisory, the system may choose to either terminate,

reverse, or strengthen the advisory. An advisory that has been reversed requires a ver-

tical rate of 1500 ft/min in the opposite direction of the original advisory. An advisory

that has been strengthened requires a vertical rate of 2500 ft/min in the direction of the

original advisory. After an advisory has been strengthened, it can then be weakened to

reduce the required vertical rate to 1500 ft/min in the direction of the original advisory.

Dynamic Model. The state is represented using four variables:

- h : altitude of the intruder relative to the own aircraft,

- h 0 : vertical rate of the own aircraft,

- h 1 : vertical rate of the intruder aircraft, and

- s RA : the state of the resolution advisory.

The discrete variable s RA contains the necessary information to model the pilot re-

sponse, which includes the active advisory and the time to execution by the pilot.

Five seconds are required for the pilot to begin responding to an initial advisory. The

pilot then applies a 1/4 g acceleration to comply with the advisory. Subsequent ad-

visories are followed with a 1/3 g acceleration after a three second delay. When an

advisory is not active, the pilot applies an acceleration selected at every step from a

zero-mean Gaussian with 3 ft/s 2 standard deviation. At each step, the intruder pilot

independently applies a random acceleration from a zero-mean Gaussian with 3 ft/s 2

standard deviation.

The continuous state variables are discretized according to the scheme in Table 1.

The discrete state transition probabilities were computed using sigma-point sampling

and multilinear interpolation [15]. This discretization scheme produces a discrete model

with 213 thousand discrete states.

Ta b l e 1 . Controlled Variable Discretization

Variable Grid Edges

h − 1000 , − 900 ,..., 1000 ft

h 0 − 2500 , − 2250 ,..., 2500 ft/min

h 1 − 2500 , − 2250 ,..., 2500 ft/min

Search WWH ::

Custom Search

Home