Civil Engineering Reference
In-Depth Information
Table A.2 Experiment parameter settings.
Experiment
Parameter
Value
nodes 20
# hidden nodes
20
nodes 80
# hidden nodes
80
w LeCun
Weight init. method
N (0, ˃ w )
tanh
Hidden transfer function
f ( x )
=
tanh ( x )
1 . 7159 tanh ( 3 x )
tanh LeCun
Hidden transfer function
f ( x )
=
ʱ anneal
Learning rate ʱ
linearly annealed
ʱ Emb
Learning rate ʱ
init. as per (Embrechts et al. 2010)
ʱ anneal , Emb
Learning rate ʱ
annealed, init. as per (Embrechts et al. 2010)
0 . 90
P(action exploitation)
= 0.90
anneal
P(action exploitation)
Annealed from 0.75 to 1
ʳ 0 . 75
Next-state decay parameter
0.75
ʻ 0 . 9
Temporal discount factor
ʻ = 0.9
ʻ 0 . 4
Temporal discount factor
ʻ = 0.4
batch 1 /mom
batch, w/momentum
epoch length = 1, ʷ = 0.5
batch 10 /mom
batch, w/momentum
epoch length = 10, ʷ = 0.5
games 100 k
# training games
100,000
parameters, such as or the number of hidden nodes, the parameter settings were
selected in order to coarsely explore the network's learning ability for variations
from the base scenario. For the experiments described below, a single parameter
was changed and all others remained as those used in the base experiment described
above. Table A.2 lists which parameters were changed for each experiment.
￿
Hidden nodes : nodes 20 and nodes 80 evaluated the effect of changing the number
of nodes in the hidden layer.
￿
Weight initialization : The weight initialization method was changed in the
w LeCun experiment to the method described in (LeCun et al. 1998) which ini-
tializes weights by sampling from a distribution
N (0, ˃ w ) where ˃ w =
m 1 / 2
and m is the number of weights leading into node w .
￿
Hidden transfer function : The hidden transfer function was changed in two
experiments to either the tanh function or a modified version of the tanh function
presented in (LeCun et al. 1998), as shown in Table A.1 .
￿
Learning rate ( ʱ ) : The effects of the learning rate ʱ was evaluated using three
experiments: (1) linearly annealing ʱ over the course of training (constant across
layers), (2) setting the learning rates, by layer, similarly to that described in
(Embrechts et al. 2010), and (3) combing the first two cases. More specifically,
the learning rates in case 2 were set by: (1) setti ng all learning rates to 1, (2) scaling
the learning rate of the input-hidden layer to 2, and (3) scaling all learning rates
such that the largest was 0.001. The last step is a slight deviation from (Embrechts
et al. 2010) where the largest learning rate was set such that it was equal to the
learning rate used in the base experiment (0.001).
 
Search WWH ::




Custom Search