Civil Engineering Reference
In-Depth Information
(Thrun 1995 ), although other work suggests that learning can be successful when
ʳ
1 (Ghory 2004 ). We find that, for the mountain car and the TBU problems, ʳ
should range approximately over [0.96, 0.99], whereas for the TTBU problem, ʳ
should range approximately over [0.97, 0.99]. In other words, slight next-state value
discounting seems to be useful across all of these problems.
The ranges of the magnitude ( ʱ mag ) and ratio ( ʱ ratio ) of the learning rates of the
neural network for the convergent subregions in the mountain car problem were found
to span almost the entire original parameter space for these parameters. In the TBU
and TTBU problems, the range of the magnitude of the learning rates should generally
be set closer to 0.01. The ratio of the learning rates in the mountain car problem were
found to range over approximately [2, 8] in the convergent subregions. In the TBU
problem, the learning rates could range over [1, 8], and in the TTBU problem, this
ratio could range over [2, 3.5] in the convergent subregions. Intuitively, a larger ratio
of the learning rates between the layers (where layers closer to the input layer have
larger learning rates) makes sense, as the error gradients at layers closer to the input
layer are generally smaller, which is suggested to be useful in the non-reinforcement
learning literature (Embrechts et al. 2010 ). The results of this work, considering all
three problems together, suggest that a learning rate ratio of greater than 1 could aid
in convergence, however, additional experimentation would be required to answer
this for sure. We also evaluated whether the number of nodes in the hidden layer
of the neural network had an effect on learning in the TBU problem. Interestingly,
we found that a minimum of about 26 hidden nodes was required for the convergent
subregions, suggesting that the neural network needs a minimum level of complexity
to be able to learn this task.
Looking at the convergent parameter subregions for each problem at a higher level,
we see that the mountain car problem had rather wide subregions. This suggests that
this problem is relatively simple and that very specific parameter ranges are not
required for learning to be successful. In the TBU problem, we found that there
are very specific and consistent parameter subregions where learning can occur.
Similarly, the TTBU problem was also found to require relatively smaller parameter
subregions for learning to occur, though we note again that the original parameter
space of the experiment was smaller than in the other problems as well.
If practitioners merely use parameter settings that are recommended in the liter-
ature, they may very well choose parameter combinations which do not fall within
convergent parameter subregions. We note though, that we do not guarantee that
the convergent parameter subregions found in this work are the only subregions that
allow for successful reinforcement learning. The identification of these convergent
subregions is dependent on the experimentation, and this will be discussed later.
Additionally, small changes to the problem characteristics (i.e., action space, reward
structure, problem dynamics, etc.) would likely change the convergent parameter
subregions. We believe that there should be some consistency among parameter sub-
regions for similar types of domain problems, however, additional work exploring
the effects of altering domain characteristics would be required to support this claim.
In the mountain car and TBU problems, we explored the performance of rein-
forcement learning using kriging metamodels. The influence of the parameters in
=
Search WWH ::




Custom Search