Civil Engineering Reference
In-Depth Information
Table 7.2 Variables and their associated ranges used in sequential CART for the TTBU problem.
Variable
Description
Range
RL component
ʱ mag
Base (input-hidden layer) learning rate
[0.0001, 0.01]
Neural network
ʱ ratio
Learning rate ratio
[2.0, 5.0]
Neural network
ʻ
Temporal discount factor
[0.4, 0.7]
TD( ʻ ) algorithm
ʳ
Next-state discount factor
[0.96, 0.99]
TD( ʻ ) algorithm
P (action exploitation)
[0.85, 0.97]
TD( ʻ ) algorithm
and based on preliminary testing. We do not claim that 51 nodes is an ideal number
of nodes in the hidden layer, and experimentation including this as a variable could
be performed, but removing this variable reduces the complexity of the experimen-
tation. The input layer had five nodes (for the five state variables), and the output
layer had nine nodes (for the nine possible actions). Prior to passing the state into
the neural network, the state variables x and y were scaled to [
0 . 1, 0 . 1] (relative
to the domain bounds of x , y
=
[
100, 100]) so that they were on a similar scale as
the truck angles ʸ 0 , ʸ 2 , and ʸ 4 .
7.2
Sequential CART
We investigated the convergence in the TTBU domain for five different parameters
of the neural network and of the TD( ʻ ) algorithm. The parameters and their initial
ranges are shown in Table 7.2 . These parameter ranges are slightly smaller than those
used in the previous problems, and this was done to have a slightly more focused
experiment due to the longer simulation times of the TTBU domain. The ranges of
these parameters were chosen based on prior experience with these methods, and
with little excess range on either end of the generally used parameters, which still
results in a rather large parameter space.
The parameters used in the sequential CART modeling are shown in Table 7.3 .
Each design point evaluated using sequential CART consisted of having the agent
attempt to learn the TTBU domain in 10,000 episodes, where an episode consists
of one attempt at backing the tandem trailer truck to the goal location. The initial
experimental design therefore consisted of 125 unique design points, with 3 replicates
each, totaling 375 initial runs, which was generated using Latin Hypercube sampling
(LHS). Subsequent designs for each iteration of the sequential CART algorithm were
also LHS designs, consisting of 25 design points, again with 3 replicates each, for
a total of 75 runs. Only three iterations of sequential CART was used because we
allowed for fewer design points to fall into each leaf. When fewer design points
are allowed to fall into each leaf, the CART model will often have more leaves,
which results in more parameter subregions to explore in subsequent iterations, thus
increasing the computation time.
 
Search WWH ::




Custom Search