Parameter Effects in the Game of Chung Toi - Design of Experiments for Reinforcement Learning - page 157

Civil Engineering Reference

In-Depth Information

Table A.1 Parameter settings for the base experiment.

Parameter

Value

# hidden nodes

40

∼ U [

−

Weight init. method

Sampled from

0 . 2, 0 . 2]

1

Hidden layer transfer function

Sigmoid: f ( x )

=

+ e − x

1

Output layer transfer function

Linear: f ( x )

= x

Learning rate

ʱ = 0.001 (constant across layers)

P(action exploitation)

= 0.75

Next-state decay parameter

ʳ = 1.0

Temporal discount factor

ʻ = 0.7

Weight update method

Iterative, ʷ = 0.0

# of training games

10,000

# of evaluation games

500

evaluated after every 1000 training games by playing 500 evaluation games. The

performance was quantified by the proportions of the 500 evaluation games won.

Games that had more than 100 moves were considered a draw, however in reality,

there are no draws in Chung Toi. During evaluation games, player 1 always played the

first move, always selected moves corresponding to the greatest state value ( = 1),

and was not allowed to take greedy wins or blocks. The opponent selected moves at

random, but was considered 'smart' such that it was allowed to take greedy wins and

blocks.

A.2.6

Experiments

The above-described implementation requires numerous parameters and settings,

related to the TD( ʻ ) algorithm, the neural network, or the game of Chung Toi, and

these parameters and settings likely affect the ability of the network to learn. A base

experiment ( base ) was run to serve as a reference point and used parameters that

were set as recommended in other implementations of TD( ʻ ) or for neural networks

in general (Table A.1 ; Tesauro 1992; Wiering 1995).

Some parameters were set to remain constant for all experiments. Reward values

consisted of 1 if player 1 wins,

1 if player 2 wins, and 0 for a draw. The neural

network was a fully-connected 3-layer network with 20 input nodes, 40 hidden nodes

(except in the experiments which varied the number of hidden nodes), and 1 output

node; input and hidden layers also had a bias node with a constant input value of 1.

Individual experiments were then used to evaluate the effects of changing param-

eters and settings related to the TD( ʻ ) algorithm, the neural network, and of the game

of Chung Toi. Some of these settings have been found to have a significant impact on

the training of neural networks (LeCun et al. 1998; Embrechts et al. 2010). For other

−

Next Page

Design of Experiments for Reinforcement Learning

Search WWH ::

Custom Search

Home