Civil Engineering Reference
In-Depth Information
The next-state discount factor ʳ should be in the upper range of [0.9, 1.0], and
although this parameter isn't given much consideration in the literature, it is consis-
tent with what is suggested. The action exploration-exploitation trade-off parameter
does not have a consistent range over its parameter space that allows for con-
vergence. However, in three subregions (13, 25, and 32), this parameter should be
towards the upper end of [0.6, 1.0], and this is consistent with most applications and
intuition that, in general, knowledge should be exploited. Although a high value of
may choose many erroneous actions early on in the learning process, the correct
actions will eventually surface and can then be exploited thereafter to maintain high
performance. It is interesting that there are subregions that have a wide range or that
have low values for (subregions 6 and 11, respectively). With subregion 6, is
the most influential parameter with respect to performance and we find that larger
values do have better performance based on the surface projections. Subregion 11
has ʱ mag as its most influential parameter and performance seems to improve with
larger values of this parameter.
The magnitude of the learning rate of the neural network ʱ mag can be anywhere
in the range [5
10 4 , 0.01] for some subregions, but it must be more constrained
in other subregions. The ratio of the learning rates ʱ ratio can also take on a large and
varying range; however, we do find that this ratio must just not be close to one, which
supports Embrechts et al. ( 2010 ), and this makes sense based on the magnitudes of
the error gradients between layers. Additionally, when considering ʱ mag and ʱ ratio
together, especially for subregions 25 and 32 which have small ranges for each
parameter, we see that if ʱ mag is larger, then ʱ ratio must be smaller (subregion 25),
and vice versa (for subregion 32). In terms of performance within the convergent
subregions, ʱ mag has a great influence in two of the subregions, although ʱ ratio has
little influence in any of the subregions.
The mountain car problem was explored in previous work in which we used a more
classical experimental design that was defined over a parameter space that was known
to converge relatively well based on prior experience (Gatti et al. 2013 ) (Appendix B).
A direct comparison of the results from the sequential CART approach to this previous
work is not possible, though some things can be said. If all convergent parameter
subregions are considered together, the neural network learning rate parameters used
in the previous work are very similar to the convergent parameter space found using
sequential CART. In the previous work, we allowed ʻ to range over [0.1, 0.9], ʳ
to range over [0.95, 0.99], and to range over [0.7, 0.9], and these ranges are all
generally consistent with what was found using sequential CART (Fig. 5.3 ). In the
previous work, as ʳ
×
0 . 95, the probability of convergence became small, and all
convergent subregions for ʳ were found to lie just above 0.95. Excluding variations
that are possibly due to the sampling resolution, the previous work showed rather
consistent convergence across values of ʻ , and the convergent subregions in the
current work show that ʻ can be set over a rather large range of [0.000, 0.952].
Because of the differences in the methods used to assess the performance (i.e., number
of time steps to the goal) of reinforcement learning, where the previous work used a
linear model and the current work uses kriging metamodeling and global sensitivity
analysis, we refrain from making any comparisons.
Search WWH ::




Custom Search