Civil Engineering Reference
In-Depth Information
Fig. 6.1 The state of the truck
is defined by the rear trailer
position ( x , y ), the trailer
angle ʸ T , and the cab angle
ʸ C . The goal of the problem is
to back the truck into the
loading dock at ( x , y )
Cab
ʸ C
Trailer
ʸ T
(0, 0)
(
x
,
y
)
=
(0, 0)
where ʸ T
=
0.
Loading dock
arcsin A
·
sin ( ʸ C )
L T
ʸ T
=
ʸ T
arcsin v
sin ( u )
L C +
·
ʸ C =
ʸ C +
L T
where A
6
(cab length). The wheel angle relative to the cab angle is specified by u (radians),
and three discrete actions were allowed: u
=
v
·
cos ( u ), B
=
A
·
cos ( ʸ C ), v
=
3, L T
=
14 (tailer length), and L C =
. The truck velocity was not
taken into account as backing the trailer is assumed to be a slow process. The truck
was restricted to the domain boundaries x
={−
1, 0, 1
}
100, 100]. The goal
of this problem was to have the trailer positioned at the loading dock with a specific
orientation within a fixed number of time steps. This goal criteria can be represented
as: x
=
[0, 200] and y
=
[
=
0, y
=
0, and ʸ T
=
0.
6.1
Reinforcement Learning Implementation
The truck backer-upper problem can be viewed as a reinforcement learning problem
where each learning run attempts to back up the truck from some initial state to a
goal state, and where many learning runs are used to learn how to control the truck
at different locations and orientations throughout the domain. More specifically, the
truck begins at a random location and orientation, and the wheels of the truck are
controlled to back up the truck to a specific location and orientation. When the truck
reaches the goal, positive feedback is provided, indicating that the control strategy
in that learning run was good, whereas if the truck does not reach the goal for some
reason, negative feedback is provided.
The temporal difference algorithm TD( ʻ ) (Sect. 2.2.3) was used to train a three-
layer neural network to learn the value function V ( x , a ) that approximates the value
of being in state x and taking action a (at time the current time step). The neural
network had four inputs, corresponding to the four state variables. This is a control-
type problem that naturally lends itself to using a neural network with an output node
for each of the possible actions, and thus the neural network had three output nodes.
The number of nodes in the hidden layer was a variable in the experimental design.
The x and y components of the state vector were scaled over [
3, 3] based on the
 
Search WWH ::




Custom Search