Civil Engineering Reference
In-Depth Information
Table A.2
Experiment parameter settings.
Experiment
Parameter
Value
nodes
20
# hidden nodes
20
nodes
80
# hidden nodes
80
w
LeCun
Weight init. method
∼
N
(0,
˃
w
)
tanh
Hidden transfer function
f
(
x
)
=
tanh (
x
)
1
.
7159 tanh (
3
x
)
tanh
LeCun
Hidden transfer function
f
(
x
)
=
ʱ
anneal
Learning rate
ʱ
linearly annealed
ʱ
Emb
Learning rate
ʱ
init. as per (Embrechts et al. 2010)
ʱ
anneal
,
Emb
Learning rate
ʱ
annealed, init. as per (Embrechts et al. 2010)
0
.
90
P(action exploitation)
= 0.90
anneal
P(action exploitation)
Annealed from 0.75 to 1
ʳ
0
.
75
Next-state decay parameter
0.75
ʻ
0
.
9
Temporal discount factor
ʻ
= 0.9
ʻ
0
.
4
Temporal discount factor
ʻ
= 0.4
batch
1
/mom
batch, w/momentum
epoch length = 1,
ʷ
= 0.5
batch
10
/mom
batch, w/momentum
epoch length = 10,
ʷ
= 0.5
games
100
k
# training games
100,000
parameters, such as
or the number of hidden nodes, the parameter settings were
selected in order to coarsely explore the network's learning ability for variations
from the
base
scenario. For the experiments described below, a single parameter
was changed and all others remained as those used in the
base
experiment described
above. Table
A.2
lists which parameters were changed for each experiment.
Hidden nodes
:
nodes
20
and
nodes
80
evaluated the effect of changing the number
of nodes in the hidden layer.
Weight initialization
: The weight initialization method was changed in the
w
LeCun
experiment to the method described in (LeCun et al. 1998) which ini-
tializes weights by sampling from a distribution
∼
N
(0,
˃
w
) where
˃
w
=
m
−
1
/
2
and
m
is the number of weights leading into node
w
.
Hidden transfer function
: The hidden transfer function was changed in two
experiments to either the tanh function or a modified version of the tanh function
presented in (LeCun et al. 1998), as shown in Table
A.1
.
Learning rate (
ʱ
)
: The effects of the learning rate
ʱ
was evaluated using three
experiments: (1) linearly annealing
ʱ
over the course of training (constant across
layers), (2) setting the learning rates, by layer, similarly to that described in
(Embrechts et al. 2010), and (3) combing the first two cases. More specifically,
the learning rates in case 2 were set by: (1) setti
ng
all learning rates to 1, (2) scaling
the learning rate of the input-hidden layer to
√
2, and (3) scaling all learning rates
such that the largest was 0.001. The last step is a slight deviation from (Embrechts
et al. 2010) where the largest learning rate was set such that it was equal to the
learning rate used in the
base
experiment (0.001).
Search WWH ::
Custom Search