Parameter Effects in the Game of Chung Toi - Design of Experiments for Reinforcement Learning - page 158

Civil Engineering Reference

In-Depth Information

Table A.2 Experiment parameter settings.

Experiment

Parameter

Value

nodes 20

# hidden nodes

20

nodes 80

# hidden nodes

80

w LeCun

Weight init. method

∼ N (0, ˃ w )

tanh

Hidden transfer function

f ( x )

=

tanh ( x )

1 . 7159 tanh ( 3 x )

tanh LeCun

Hidden transfer function

f ( x )

=

ʱ anneal

Learning rate ʱ

linearly annealed

ʱ Emb

Learning rate ʱ

init. as per (Embrechts et al. 2010)

ʱ anneal , Emb

Learning rate ʱ

annealed, init. as per (Embrechts et al. 2010)

0 . 90

P(action exploitation)

= 0.90

anneal

P(action exploitation)

Annealed from 0.75 to 1

ʳ 0 . 75

Next-state decay parameter

0.75

ʻ 0 . 9

Temporal discount factor

ʻ = 0.9

ʻ 0 . 4

Temporal discount factor

ʻ = 0.4

batch 1 /mom

batch, w/momentum

epoch length = 1, ʷ = 0.5

batch 10 /mom

batch, w/momentum

epoch length = 10, ʷ = 0.5

games 100 k

# training games

100,000

parameters, such as or the number of hidden nodes, the parameter settings were

selected in order to coarsely explore the network's learning ability for variations

from the base scenario. For the experiments described below, a single parameter

was changed and all others remained as those used in the base experiment described

above. Table A.2 lists which parameters were changed for each experiment.

Hidden nodes : nodes 20 and nodes 80 evaluated the effect of changing the number

of nodes in the hidden layer.

Weight initialization : The weight initialization method was changed in the

w LeCun experiment to the method described in (LeCun et al. 1998) which ini-

tializes weights by sampling from a distribution

∼

N (0, ˃ w ) where ˃ w =

m − 1 / 2

and m is the number of weights leading into node w .

Hidden transfer function : The hidden transfer function was changed in two

experiments to either the tanh function or a modified version of the tanh function

presented in (LeCun et al. 1998), as shown in Table A.1 .

Learning rate ( ʱ ) : The effects of the learning rate ʱ was evaluated using three

experiments: (1) linearly annealing ʱ over the course of training (constant across

layers), (2) setting the learning rates, by layer, similarly to that described in

(Embrechts et al. 2010), and (3) combing the first two cases. More specifically,

the learning rates in case 2 were set by: (1) setti ng all learning rates to 1, (2) scaling

the learning rate of the input-hidden layer to √ 2, and (3) scaling all learning rates

such that the largest was 0.001. The last step is a slight deviation from (Embrechts

et al. 2010) where the largest learning rate was set such that it was equal to the

learning rate used in the base experiment (0.001).

Next Page

Design of Experiments for Reinforcement Learning

Search WWH ::

Custom Search

Home