Parameter Effects in the Game of Chung Toi - Design of Experiments for Reinforcement Learning

Civil Engineering Reference

In-Depth Information

using a neural network, could learn a challenging environment (Gatti et al. 2011b).

The reinforcement learning algorithm and the neural network have numerous param-

eters that are required to be set by the user. For neural networks in general, specific

parameter settings and techniques can be employed that enable efficient training

(LeCun et al. 1998; Embrechts et al. 2010). The utility of these techniques has not

been explored in the context of the TD( ʻ ) algorithm. The purpose of this work is to

explore the effects of various algorithm settings, of the TD( ʻ ) algorithm and of the

neural network, in a basic implementation of reinforcement learning to the game of

Chung Toi.

A.2

Methodology

A.2.1

Chung Toi

The game of Chung Toi is similar to that of Tic-Tac-Toe, as it is played on a 3

3 board

and the goal of the game is to obtain 3 of ones' pieces in a row. Chung Toi is unique in

that each player has only 3 of either white or red octagonal pieces. Additionally, the

game proceeds in two phases: the first phase consists of each player taking turns and

placing their pieces on the board, orienting each piece either cardinally or diagonally;

the second phase consists of each player moving and/or rotating one of their pieces

while attempting to align 3 of their pieces in a row. The pieces are labeled with

arrows (Fig. A.1 ), which dictate the direction that each particular piece is allow to be

moved. If the arrows are aligned cardinally with the board, the piece can be moved

either horizontally or vertically to an open position, whereas if the arrows of a piece

are aligned diagonally, the piece can move diagonally to an open position.

×

A.2.2

The Reinforcement Learning Method

Reinforcement learning enables an agent to learn how to behave in an environment

by using iterative interactions with the environment to improve its ability to make

decisions (Fig. A.2 ). With regards to the game of Chung Toi, the agent iteratively

plays the game and improves its ability to place and/or move pieces around the board

in order to win the game. At any instance in the game, the agent senses the state

of the environment, which contains all information that is necessary to evaluate the

value of the state. Such information can include the board configuration and which

player is to play next. The agent selects actions by evaluating the value of all possible

subsequent states, and selects the action which results in the greatest next-state value.

Feedback is provided to the agent, in terms of rewards or penalties, which indicates

the utility of the action, and this feedback is then used to improve the agent's estimate

of the previous state value. Implementing such a paradigm therefore requires multiple

entities, including models of the environment and the agent, as well as a method with

which the agent can improve its knowledge about the environment.

Search WWH ::

Custom Search

Home