Discussion - Design of Experiments for Reinforcement Learning

Civil Engineering Reference

In-Depth Information

(Thrun 1995 ), although other work suggests that learning can be successful when

ʳ

1 (Ghory 2004 ). We find that, for the mountain car and the TBU problems, ʳ

should range approximately over [0.96, 0.99], whereas for the TTBU problem, ʳ

should range approximately over [0.97, 0.99]. In other words, slight next-state value

discounting seems to be useful across all of these problems.

The ranges of the magnitude ( ʱ mag ) and ratio ( ʱ ratio ) of the learning rates of the

neural network for the convergent subregions in the mountain car problem were found

to span almost the entire original parameter space for these parameters. In the TBU

and TTBU problems, the range of the magnitude of the learning rates should generally

be set closer to 0.01. The ratio of the learning rates in the mountain car problem were

found to range over approximately [2, 8] in the convergent subregions. In the TBU

problem, the learning rates could range over [1, 8], and in the TTBU problem, this

ratio could range over [2, 3.5] in the convergent subregions. Intuitively, a larger ratio

of the learning rates between the layers (where layers closer to the input layer have

larger learning rates) makes sense, as the error gradients at layers closer to the input

layer are generally smaller, which is suggested to be useful in the non-reinforcement

learning literature (Embrechts et al. 2010 ). The results of this work, considering all

three problems together, suggest that a learning rate ratio of greater than 1 could aid

in convergence, however, additional experimentation would be required to answer

this for sure. We also evaluated whether the number of nodes in the hidden layer

of the neural network had an effect on learning in the TBU problem. Interestingly,

we found that a minimum of about 26 hidden nodes was required for the convergent

subregions, suggesting that the neural network needs a minimum level of complexity

to be able to learn this task.

Looking at the convergent parameter subregions for each problem at a higher level,

we see that the mountain car problem had rather wide subregions. This suggests that

this problem is relatively simple and that very specific parameter ranges are not

required for learning to be successful. In the TBU problem, we found that there

are very specific and consistent parameter subregions where learning can occur.

Similarly, the TTBU problem was also found to require relatively smaller parameter

subregions for learning to occur, though we note again that the original parameter

space of the experiment was smaller than in the other problems as well.

If practitioners merely use parameter settings that are recommended in the liter-

ature, they may very well choose parameter combinations which do not fall within

convergent parameter subregions. We note though, that we do not guarantee that

the convergent parameter subregions found in this work are the only subregions that

allow for successful reinforcement learning. The identification of these convergent

subregions is dependent on the experimentation, and this will be discussed later.

Additionally, small changes to the problem characteristics (i.e., action space, reward

structure, problem dynamics, etc.) would likely change the convergent parameter

subregions. We believe that there should be some consistency among parameter sub-

regions for similar types of domain problems, however, additional work exploring

the effects of altering domain characteristics would be required to support this claim.

In the mountain car and TBU problems, we explored the performance of rein-

forcement learning using kriging metamodels. The influence of the parameters in

=

Design of Experiments for Reinforcement Learning

Search WWH ::

Custom Search

Home