Civil Engineering Reference
In-Depth Information
Table 5.2 Variables and their associated ranges used in sequential CART for the mountain car
problem.
Variable
Description
Range
RL component
ʱ mag
Base (input-hidden layer) learning rate
[0.0005, 0.01]
Neural network
ʱ ratio
Learning rate ratio
[1.0, 8.0]
Neural network
ʻ
Temporal discount factor
[0.0, 1.0]
TD( ʻ ) algorithm
ʳ
Next-state discount factor
[0.9, 1.0]
TD( ʻ ) algorithm
P (action exploitation)
[0.6, 1.0]
TD( ʻ ) algorithm
5.2
Sequential CART
Sequential CART modeling was used to identify convergent parameter subregions.
Five parameters were studied in this procedure, with two related to the neural network
and three related to the TD( ʻ ) algorithm (Table 5.2 ). The ranges of these parame-
ters generally include what is recommended for reinforcement learning or neural
networks, though these ranges were expanded in order to find potentially useful
parameter subregions that are outside of what is typically recommended.
Both the magnitude of the learning rates and the ratio of the learning rates between
the input-hidden and hidden-output layers are studied here. Generally, only the
magnitude of the learning rates is given consideration and a single learning rate
is used for the entire network. However, the weight update equations for the TD( ʻ )
algorithm (and the back-propagation algorithm) naturally result in the gradient of the
input-hidden layer having a smaller magnitude than that of the hidden-output layer,
and there is evidence supporting the use of learning rates that are scaled between
layers (LeCun et al. 1998 ; Embrechts et al. 2010 ). We are therefore interested to
see if the ratio of the learning rates has an effect on reinforcement learning and
include this parameter in this study. The learning rates of the network were set based
on the variables ʱ mag , a base learning rate between the input and hidden layer of
the network, and ʱ ratio , the ratio of the learning rates for each subsequent layer of
the network with respect to the input-hidden layer. For example, in a three-layer
neural network, the learning rates of the input-hidden layer ( ʱ hi ) and the hidden-
output layer ( ʱ oh ), respectively, would be set as ʱ hi
ʱ mag
ʱ ratio . The
parameters ʻ , ʳ , and are all related to the TD( ʻ ) algorithm (Sect. 2.2.3), where ʻ
is the temporal discount factor, ʳ is a next-state discount factor, and is an action
selection exploration-exploitation trade-off parameter.
The parameters and settings used in the sequential CART algorithm are listed in
Table 5.3 . These settings were chosen based on preliminary experimentation and use
with the sequential CART process. The initial, or seed, experimental run consisted
of 60 design points with 3 replication each, and each subsequent sub-experiment in
the sequential CART procedure used 20 new design points, also with 3 replications
each. All designs were generated using Latin hypercube sampling. The proportion
of points within each design labeled as low was 0.80, and the required convergence
rate for any low leaf node was 90 %.
=
=
ʱ mag and ʱ oh
 
Search WWH ::




Custom Search