Civil Engineering Reference
In-Depth Information
The evaluation of the network against human opponents was used to analyze the
learned playing behavior. The network exhibited clear tactical knowledge, including
the ability to take and block wins, as well as what seemed to be look-ahead plays. The
human opponents would often exploit the network's knowledge gaps by repeatedly
playing a winning strategy, and the network could not defend against this due to
its deterministic style of play. The results of this evaluation indicate that while the
basic TD( ʻ ) algorithm was successful in training the network to play the game, the
knowledge gained was below that of a novice player.
This work used only single runs of experiments for each parameter setting. TD( ʻ )
includes a random component and thus performance may vary between runs with
the same parameter settings. Running additional experiments may provide a more
conclusive picture of the effects of each parameter. Other aspects of and extensions
to reinforcement learning have the potential to affect the learning, but that were not
explored, and some of these include other state encoding schemes, training opponents
(Wiering 2010), and multi-step look-ahead (Tesauro 2002).
A.5
Conclusion
This work explored the effects of parameter settings in the TD( ʻ ) algorithm for the
game of Chung Toi. The experiments which resulted in stable and good maximal per-
formance compared to the basic implementation, and which had only one parameter
modified, included those with different learning rates ʱ for each layer and that used a
relatively high degree of action exploitation. Modifying multiple parameters resulted
in worse performance that was unstable during training, indicating that the effects of
parameter settings are not additive. This work adds to the body of literature concern-
ing the application of neural network-based reinforcement learning to board games.
References
Binkley, K. J., Seehart, K., & Hagiwara, M. (2007). A study of artificial neural network architectures
for Othello evaluation functions. Information and Media Technologies , 2(4), 1129-1139.
Embrechts, M. J., Hargis, B. J., & Linton, J. D. (2010). An augmented efficient backpropagation
training strategy for deep autoassociative neural networks. In Proceedings of the 2010 Interna-
tional Joint Conference on Neural Networks (IJCNN), Barcelona, Spain, 18-23 July (pp. 1-6).
doi: 10.1109/IJCNN.2010. 5596828
Gatti, C. J., Embrechts, M. J., & Linton, J. D. (2011a). Parameter settings of reinforcement learning
for the game of Chung Toi. In Proceedings of the 2011 IEEE International Conference on
Systems, Man, and Cybernetics (SMC 2011), Anchorage, AK, 9-12 October (pp. 3530-3535).
doi: 10.1109/ICSMC.2011.6084216
Gatti, C. J., Linton, J. D., & Embrechts, M. J. (2011b). A brief tutorial on reinforcement learning:
The game of Chung Toi. In Proceedings of the 19th European Symposium on Articial Neural
Networks, Computational Intelligence and Machine Learning (ESANN), Bruges, Belgium, 27-
29 April (pp. 129-134). Bruges, Belgium: ESANN.
Search WWH ::




Custom Search