Civil Engineering Reference
In-Depth Information
using a neural network, could learn a challenging environment (Gatti et al. 2011b).
The reinforcement learning algorithm and the neural network have numerous param-
eters that are required to be set by the user. For neural networks in general, specific
parameter settings and techniques can be employed that enable efficient training
(LeCun et al. 1998; Embrechts et al. 2010). The utility of these techniques has not
been explored in the context of the TD( ʻ ) algorithm. The purpose of this work is to
explore the effects of various algorithm settings, of the TD( ʻ ) algorithm and of the
neural network, in a basic implementation of reinforcement learning to the game of
Chung Toi.
A.2
Methodology
A.2.1
Chung Toi
The game of Chung Toi is similar to that of Tic-Tac-Toe, as it is played on a 3
3 board
and the goal of the game is to obtain 3 of ones' pieces in a row. Chung Toi is unique in
that each player has only 3 of either white or red octagonal pieces. Additionally, the
game proceeds in two phases: the first phase consists of each player taking turns and
placing their pieces on the board, orienting each piece either cardinally or diagonally;
the second phase consists of each player moving and/or rotating one of their pieces
while attempting to align 3 of their pieces in a row. The pieces are labeled with
arrows (Fig. A.1 ), which dictate the direction that each particular piece is allow to be
moved. If the arrows are aligned cardinally with the board, the piece can be moved
either horizontally or vertically to an open position, whereas if the arrows of a piece
are aligned diagonally, the piece can move diagonally to an open position.
×
A.2.2
The Reinforcement Learning Method
Reinforcement learning enables an agent to learn how to behave in an environment
by using iterative interactions with the environment to improve its ability to make
decisions (Fig. A.2 ). With regards to the game of Chung Toi, the agent iteratively
plays the game and improves its ability to place and/or move pieces around the board
in order to win the game. At any instance in the game, the agent senses the state
of the environment, which contains all information that is necessary to evaluate the
value of the state. Such information can include the board configuration and which
player is to play next. The agent selects actions by evaluating the value of all possible
subsequent states, and selects the action which results in the greatest next-state value.
Feedback is provided to the agent, in terms of rewards or penalties, which indicates
the utility of the action, and this feedback is then used to improve the agent's estimate
of the previous state value. Implementing such a paradigm therefore requires multiple
entities, including models of the environment and the agent, as well as a method with
which the agent can improve its knowledge about the environment.
Search WWH ::




Custom Search