The Mountain Car Problem - Design of Experiments for Reinforcement Learning

Civil Engineering Reference

In-Depth Information

The next-state discount factor ʳ should be in the upper range of [0.9, 1.0], and

although this parameter isn't given much consideration in the literature, it is consis-

tent with what is suggested. The action exploration-exploitation trade-off parameter

does not have a consistent range over its parameter space that allows for con-

vergence. However, in three subregions (13, 25, and 32), this parameter should be

towards the upper end of [0.6, 1.0], and this is consistent with most applications and

intuition that, in general, knowledge should be exploited. Although a high value of

may choose many erroneous actions early on in the learning process, the correct

actions will eventually surface and can then be exploited thereafter to maintain high

performance. It is interesting that there are subregions that have a wide range or that

have low values for (subregions 6 and 11, respectively). With subregion 6, is

the most influential parameter with respect to performance and we find that larger

values do have better performance based on the surface projections. Subregion 11

has ʱ mag as its most influential parameter and performance seems to improve with

larger values of this parameter.

The magnitude of the learning rate of the neural network ʱ mag can be anywhere

in the range [5

10 − 4 , 0.01] for some subregions, but it must be more constrained

in other subregions. The ratio of the learning rates ʱ ratio can also take on a large and

varying range; however, we do find that this ratio must just not be close to one, which

supports Embrechts et al. ( 2010 ), and this makes sense based on the magnitudes of

the error gradients between layers. Additionally, when considering ʱ mag and ʱ ratio

together, especially for subregions 25 and 32 which have small ranges for each

parameter, we see that if ʱ mag is larger, then ʱ ratio must be smaller (subregion 25),

and vice versa (for subregion 32). In terms of performance within the convergent

subregions, ʱ mag has a great influence in two of the subregions, although ʱ ratio has

little influence in any of the subregions.

The mountain car problem was explored in previous work in which we used a more

classical experimental design that was defined over a parameter space that was known

to converge relatively well based on prior experience (Gatti et al. 2013 ) (Appendix B).

A direct comparison of the results from the sequential CART approach to this previous

work is not possible, though some things can be said. If all convergent parameter

subregions are considered together, the neural network learning rate parameters used

in the previous work are very similar to the convergent parameter space found using

sequential CART. In the previous work, we allowed ʻ to range over [0.1, 0.9], ʳ

to range over [0.95, 0.99], and to range over [0.7, 0.9], and these ranges are all

generally consistent with what was found using sequential CART (Fig. 5.3 ). In the

previous work, as ʳ

×

0 . 95, the probability of convergence became small, and all

convergent subregions for ʳ were found to lie just above 0.95. Excluding variations

that are possibly due to the sampling resolution, the previous work showed rather

consistent convergence across values of ʻ , and the convergent subregions in the

current work show that ʻ can be set over a rather large range of [0.000, 0.952].

Because of the differences in the methods used to assess the performance (i.e., number

of time steps to the goal) of reinforcement learning, where the previous work used a

linear model and the current work uses kriging metamodeling and global sensitivity

analysis, we refrain from making any comparisons.

ₒ

Design of Experiments for Reinforcement Learning

Search WWH ::

Custom Search

Home