The Mountain Car Problem - Design of Experiments for Reinforcement Learning - page 95

Civil Engineering Reference

In-Depth Information

Table 5.2 Variables and their associated ranges used in sequential CART for the mountain car

problem.

Variable

Description

Range

RL component

ʱ mag

Base (input-hidden layer) learning rate

[0.0005, 0.01]

Neural network

ʱ ratio

Learning rate ratio

[1.0, 8.0]

Neural network

ʻ

Temporal discount factor

[0.0, 1.0]

TD( ʻ ) algorithm

ʳ

Next-state discount factor

[0.9, 1.0]

TD( ʻ ) algorithm

P (action exploitation)

[0.6, 1.0]

TD( ʻ ) algorithm

5.2

Sequential CART

Sequential CART modeling was used to identify convergent parameter subregions.

Five parameters were studied in this procedure, with two related to the neural network

and three related to the TD( ʻ ) algorithm (Table 5.2 ). The ranges of these parame-

ters generally include what is recommended for reinforcement learning or neural

networks, though these ranges were expanded in order to find potentially useful

parameter subregions that are outside of what is typically recommended.

Both the magnitude of the learning rates and the ratio of the learning rates between

the input-hidden and hidden-output layers are studied here. Generally, only the

magnitude of the learning rates is given consideration and a single learning rate

is used for the entire network. However, the weight update equations for the TD( ʻ )

algorithm (and the back-propagation algorithm) naturally result in the gradient of the

input-hidden layer having a smaller magnitude than that of the hidden-output layer,

and there is evidence supporting the use of learning rates that are scaled between

layers (LeCun et al. 1998 ; Embrechts et al. 2010 ). We are therefore interested to

see if the ratio of the learning rates has an effect on reinforcement learning and

include this parameter in this study. The learning rates of the network were set based

on the variables ʱ mag , a base learning rate between the input and hidden layer of

the network, and ʱ ratio , the ratio of the learning rates for each subsequent layer of

the network with respect to the input-hidden layer. For example, in a three-layer

neural network, the learning rates of the input-hidden layer ( ʱ hi ) and the hidden-

output layer ( ʱ oh ), respectively, would be set as ʱ hi

ʱ mag

ʱ ratio . The

parameters ʻ , ʳ , and are all related to the TD( ʻ ) algorithm (Sect. 2.2.3), where ʻ

is the temporal discount factor, ʳ is a next-state discount factor, and is an action

selection exploration-exploitation trade-off parameter.

The parameters and settings used in the sequential CART algorithm are listed in

Table 5.3 . These settings were chosen based on preliminary experimentation and use

with the sequential CART process. The initial, or seed, experimental run consisted

of 60 design points with 3 replication each, and each subsequent sub-experiment in

the sequential CART procedure used 20 new design points, also with 3 replications

each. All designs were generated using Latin hypercube sampling. The proportion

of points within each design labeled as low was 0.80, and the required convergence

rate for any low leaf node was 90 %.

=

=

ʱ mag and ʱ oh

Next Page

Design of Experiments for Reinforcement Learning

Search WWH ::

Custom Search

Home