Civil Engineering Reference
In-Depth Information
Table A.1 Parameter settings for the base experiment.
Parameter
Value
# hidden nodes
40
U [
Weight init. method
Sampled from
0 . 2, 0 . 2]
1
Hidden layer transfer function
Sigmoid: f ( x )
=
+ e x
1
Output layer transfer function
Linear: f ( x )
= x
Learning rate
ʱ = 0.001 (constant across layers)
P(action exploitation)
= 0.75
Next-state decay parameter
ʳ = 1.0
Temporal discount factor
ʻ = 0.7
Weight update method
Iterative, ʷ = 0.0
# of training games
10,000
# of evaluation games
500
evaluated after every 1000 training games by playing 500 evaluation games. The
performance was quantified by the proportions of the 500 evaluation games won.
Games that had more than 100 moves were considered a draw, however in reality,
there are no draws in Chung Toi. During evaluation games, player 1 always played the
first move, always selected moves corresponding to the greatest state value ( = 1),
and was not allowed to take greedy wins or blocks. The opponent selected moves at
random, but was considered 'smart' such that it was allowed to take greedy wins and
blocks.
A.2.6
Experiments
The above-described implementation requires numerous parameters and settings,
related to the TD( ʻ ) algorithm, the neural network, or the game of Chung Toi, and
these parameters and settings likely affect the ability of the network to learn. A base
experiment ( base ) was run to serve as a reference point and used parameters that
were set as recommended in other implementations of TD( ʻ ) or for neural networks
in general (Table A.1 ; Tesauro 1992; Wiering 1995).
Some parameters were set to remain constant for all experiments. Reward values
consisted of 1 if player 1 wins,
1 if player 2 wins, and 0 for a draw. The neural
network was a fully-connected 3-layer network with 20 input nodes, 40 hidden nodes
(except in the experiments which varied the number of hidden nodes), and 1 output
node; input and hidden layers also had a bias node with a constant input value of 1.
Individual experiments were then used to evaluate the effects of changing param-
eters and settings related to the TD( ʻ ) algorithm, the neural network, and of the game
of Chung Toi. Some of these settings have been found to have a significant impact on
the training of neural networks (LeCun et al. 1998; Embrechts et al. 2010). For other
 
Search WWH ::




Custom Search