Civil Engineering Reference
In-Depth Information
the convergent subregions in the mountain car problem were found to vary consid-
erably over the different subregions. In other words, there was no consistency in the
influence of parameters across the subregions, and thus the behavior of the response
surfaces are likely quite unique for the different subregions. Furthermore, the in-
teractions among parameters in the FANOVA graphs are quite different among the
subregions. In the TBU problem, however, the influence of parameters across the
subregions was almost completely consistent, such that ʳ had an overwhelming in-
fluence on the variability of the performance of reinforcement learning, and all other
parameters had very little influence. Based on the FANOVA graphs, there is also
some consistency among parameter interactions as well. The convergent parameter
subregions for the TBU problem have very consistent ranges across all subregions.
Additional analysis on the ranges of the convergent subregions showed that none of
these regions are immediately adjacent to each other. However, they may be close
to each other, and this fact, along with the consistent parameter sensitivity profiles,
suggests that these subregions come from a similar portion of the parameter space.
The division of this portion of the parameter space into smaller subregions may ei-
ther be due to the fact that there really are separate convergent subregions or due
to the sampling of the experimental design, and this could be answered with a finer
experimental design in this portion of the parameter space.
It is possible that there are convergent parameter subregions, or unique character-
istics of the learning process, that are consistent for different classes of problems, but
that may be different across problem classes. The intuition behind this is consistent
with our belief that the success of reinforcement learning is dependent on the do-
main characteristics of the problem. To be able to know for sure if different problems
classes share similarities, many more domains would have to be explored, which
would result in a significant knowledge repository for the reinforcement learning
community.
The problems explored in this work could all be considered to be vehicle control-
type problems. With only exploring three different problems, one of which being
a relaxed version of the true problem, it is difficult to give certain guidance on
parameter settings, especially considering that finding consistencies across domains
was not a primary goal of this work. However, based on the convergent subregions
for the problems explore, ʳ should generally be high (
0 . 96-0 . 99) as should
(
0 . 85-0 . 97). For ʻ , the only guidance we can give is not to simply follow what
is typically used in the literature (
0 . 5-0 . 7). In terms of the neural network, we
would suggest setting the ratio of the learning rates to at least 2, and using a base
learning rate of close to 0.01. We should stress that these suggestions should only be
used as starting points for the parameter settings, and that some exploration of these
parameters around these ranges will likely be required for different problems.
The current work used only the bare bones TD( ʻ ) learning algorithm with no ad-
ditional heuristics. This was done primarily to determine what types of problems this
learning algorithm could learn by itself. Heuristics are an effective way to improve
the performance of a base learning algorithm by tailoring or customizing the learning
process to the problem at hand. While the mountain car and the single trailer truck
backer-upper problem were learned successfully using only the TD( ʻ ) algorithm,
Search WWH ::




Custom Search