Civil Engineering Reference
In-Depth Information
effect on convergence speed, only ʳ has a significant effect on mean performance
(though the effect is only
4 time steps), and ʻ and do not affect the quality of the
solution. In other words, while ʻ , ʳ , and and their interactions significantly affect
convergence, they have little or no practical impact on mean performance. A natural
extension of this work is to use additional design of experiments methods, such as
response surface methodologies, to optimize parameter settings.
From a design of experiments perspective, this experiment has a unique character-
istic such that some runs may not converge, and this scenario has received little or no
attention in the literature. These unique outcomes motivated the use of the sequential
analysis in order to separate convergence and parameters effects. Experimental de-
sign
D 2 focused on a small parameter space that converged very frequently, though
it did not always converge. Furthermore, caution should be used when extrapolating
the results to parameters outside of the ranges used in this study, as severe nonlinear-
ities were observed at the parameter space edges (e.g., ʻ
1).
This study investigated the effects of the primary variables of the TD( ʻ ) algorithm,
though the learning rate ʱ likely also has an effect on learning, and this could be
included in future work. Finally, the extensibility of the findings presented herein
to different representations, learning algorithms, or domains is unknown, and more
exhaustive studies are needed to form generalizable conclusions.
=
0 . 01 or 0 . 99, or ʳ
=
References
Bhatnagar, S., Sutton, R., Ghavamzadeh, M., & Lee, M. (2009). Natural actor critic algorithms.
Automatica , 45(11), 2471-2482.
Gatti, C. J. & Embrechts, M. J. (2012). Reinforcement learning with neural networks: Tricks of
the trade. In Georgieva, P., Mihayolva, L., & Jain, L. (Eds.), Advances in Intelligent Signal
Processing and Data Mining (pp. 275-310). New York, NY: Springer-Verlag.
Gatti, C. J., Embrechts, M. J., & Linton, J. D. (2011a). Parameter settings of reinforcement learning
for the game of Chung Toi. In Proceedings of the 2011 IEEE International Conference on
Systems, Man, and Cybernetics (SMC 2011), Anchorage, AK, 9-12 October (pp. 3530-3535).
doi: 10.1109/ICSMC.2011.6084216
Gatti, C. J., Embrechts, M. J., & Linton, J. D. (2013). An empirical analysis of reinforcement
learning using design of experiments. In Proceedings of the 21st European Symposium on Arti-
ficial Neural Networks, Computational Intelligence and Machine Learning (ESANN), Bruges,
Belgium, 24-26 April (pp. 221-226). Bruges, Belgium: ESANN.
Montgomery, D. C. (2008). Design and Analysis of Experiments (7th edition) . Hoboken, NJ: John
Wiley & Sons, Inc.
Moore, A. W. (1990). Efficient memory-based learning for robot control . Unpublished PhD
dissertation, University of Cambridge, Cambridge, United Kingdom.
Ng, A. Y., Coates, A., Diel, M., Ganapathi, V., Schulte, J., Tse, B., Berger, E. & Liang, E. (2004).
Autonomous inverted helicopter flight via reinforcement learning. In International Symposium
on Experimental Robotics (ISER-2004), Singapore, 18-21 June (pp. 363-372). Cambridge,
MA: MIT Press.
Sutton, R. S. & Barto, A. G. (1998). Reinforcement Learning . Cambridge, MA: MIT Press.
Tesauro, G. (1995). Temporal difference learning and TD-Gammon. Communications of the ACM ,
38(3), 58-68.
Tsitsiklis, J. N. & Roy, B. V. (1996). Feature-based methods for large scale dynamic programming.
Machine Learning , 22(1-3), 59-94.
Search WWH ::




Custom Search