Information Technology Reference
In-Depth Information
or goal-oriented problems as well by incorporating negative costs. So long as the prob-
lem satisfies a set of assumptions, this solution method will provide a finite-horizon
solution. The approach involves independently solving a controlled subproblem and an
uncontrolled subproblem and combining the results online to identify the approximately
optimal action.
4.1
Assumptions
It is assumed that the state is represented by a set of variables, some controlled and some
uncontrolled. The state space of the controlled variables is denoted S c , and the state
space of the uncontrolled variables is denoted S u . The state of the controlled variables
at time t is denoted s c ( t ) , and the state of the uncontrolled variables at time t is denoted
s u ( t ) . The solution technique may be applied when the following three assumptions
hold:
1. The state s u ( t +1) depends only upon s u ( t ) . The probability of transitioning from
s u to s u is given by T ( s u ,s u ) .
2. The episode terminates when s u
S u with immediate cost C ( s c ) .
3. In nonterminal states, the immediate cost c ( t +1) depends only upon s c ( t ) and
a ( t ) . If the controlled state is s c and action a is executed, the immediate cost is
denoted C ( s c ,a ) .
G
4.2
Controlled Subproblem
Solving the controlled subproblem involves computing the optimal policy for the con-
trolled variables under the assumption that the time until s u enters G , denoted τ ,is
known. In an airborne collision avoidance context, τ may be the number of steps until
another aircraft comes within 500 ft horizontally. Of course, τ cannot be determined
exactly from s u ( t ) because it depends upon an event that occurs in the future, but this
will be addressed by the uncontrolled subproblem (Section 4.3).
The cost to go from s c given τ is denoted J τ ( s c ) .Theseries J 0 ,...,J K is computed
recursively, starting with J 0 ( s c )= C ( s c ) and iterating as follows:
C ( s c ,a )+
s c
.
T ( s c ,a,s c ) J k− 1 ( s c )
J k ( s c )=min
a
(4)
The expected cost to go from s c when executing a for one step and then following the
optimal policy is given by
J k ( s c ,a )= C ( s c ,a )+
s c
T ( s c ,a,s c ) J k− 1 ( s c ) .
(5)
The K -step expected cost to go when τ>K is denoted J K . It is computed by ini-
tializing J 0 ( s c )=0 for all states and iterating equation 4 to horizon K .Theseries
J 0 ,...,J K ,J K is saved in a table in memory, requiring O ( K
|
A
||
S c |
) entries.
 
Search WWH ::




Custom Search