Reinforcement Learning - Design of Experiments for Reinforcement Learning

Civil Engineering Reference

In-Depth Information

Tsitsiklis, J. N. & Roy, B. V. (1997). An analysis of temporal-difference learning with function

approximation. IEEE Transactions on Automatic Control , 42(5), 674-690.

van Eck, N. J. & van Wezel, M. (2008). Application of reinforcement learning to the game of

othello. Computers & Operations Research , 35(6), 1999-2017.

van Hasselt, H. & Wiering, M. A. (2007). Reinforcement learning in continuous action

spaces. In Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Program-

ming and Reinforcement Learning (ADPRL), Honolulu, HI, 1-5 April (pp. 272-279). Retrieved

from http://webdocs.cs.ualberta.ca/ ~

vanhasse/ papers/Reinforcement_Learning_in_ Continu-

ous_Action_Spaces.pdf

van Seijen, H., Whiteson, S., van Hasselt, H., & Wiering, M. (2011). Exploiting best-match

equations for efficient reinforcement learning. Journal of Machine Learning Research , 12(Jun),

2045-2094.

Veness, J., Silver, D., Uther, W., & Blair, A. (2009). Bootstrapping from game tree search. In Bengio,

Y., Schuurmans, D., Lafferty, J. D., Williams, C. K. I., & Culotta, A. (Eds.), Advances in Neural

Information Processing Systems 22 (pp. 1937-1945). Red Hook, NY: Curran Associates, Inc.

Watkins, C. J. C. H. (1989). Learning from delayed rewards . Unpublished PhD dissertation, King's

College, Cambridge, England.

Watkins, C. J. C. H. & Dayan, P. (1992). Q-learning. Machine Learning , 8(3-4), 279-292.

Werbos, P. J. (1974). Beyond regression: New tools for prediction and analysis in the behavioural

sciences . Unpublished PhD dissertation, Harvard University, Cambridge, MA.

Werbos, P. J. (1989). Backpropagation and neurocontrol: A review and prospectus. In Proceedings

of the International Joint Conference on Neural Networks (IJCNN), Washington, D.C., 18-22

June (pp. 209-216). doi: 10.1109/IJCNN. 1989.118583

Whiteson, S. & Stone, P. (2006). Evolutionary function approximation for reinforcement learning.

Machine Learning Research , 7, 877-917.

Whiteson, S., Tanner, B., Taylor, M. E., & Stone, P. (2009). Generalized domains for empirical

evaluations in reinforcement learning. In Proceedings of the 26th International Conference on

Machine Learning: Workshop on Evaluation Methods for Machine Learning, Montreal, Canada,

14-18 June . Retrieved from http://www.site.uottawa.ca/ICML09WS/papers/w8.pdf

Whiteson, S., Taylor, M. E., & Stone, P. (2010). Critical factors in the empirical performance of tem-

poral difference and evolutionary methods for reinforcement learning. Journal of Autonomous

Agents and Multi-Agent Systems , 21(1), 1-35.

Whiteson, S., Tanner, B., Taylor, M. E., & Stone, P. (2011). Protecting against evaluation overfit-

ting in empirical reinforcement learning. In Proceedings of the IEEE Symposium on Adaptive

Dynamic Programming and Reinforcement Learning (ADPRL), Paris, France, 11-15 April

(pp. 120-127). doi: 10.1109/ ADPRL.2011.5967363

Wiering, M. A. (1995). TD learning of game evaluation functions with hierarchical neural architec-

tures . Unpublished masters thesis, Department of Computer Science, University of Amsterdam,

Amsterdam, Netherlands.

Wiering, M. A. (2010). Self-play and using an expert to learn to play backgammon with temporal

difference learning. Journal of Intelligent Learning Systems & Applications , 2(2), 57-68.

Wiering, M. A. & van Hasselt, H. (2007). Two novel on-policy reinforcement learning algorithms

based on TD( ʻ )-methods. In Proceedings of the IEEE International Symposium on Adap-

tive Dynamic Programming and Reinforcement Learning (ADPRL), Honolulu, HI, 1-5 April

(pp. 280-287). doi: 10.1109/ADPRL.2007.368200

Wiering, M. A. & van Hasselt, H. (2008). Ensemble algorithms in reinforcement learning. IEEE

Transactions on Systems, Man, and Cybernetics , 38(4), 930-936.

Wiering, M. A., Patist, J. P., & Mannen, H. (2007). Learning to play board games using

temporal difference methods (Technical Report UU-CS-2005-048, Institute of Informa-

tion and Computing Sciences, Utrecht University). Retrieved from http://www.ai.rug.nl/ ~

mwiering/GROUP/ARTICLES/learning_games_TR.pdf.

Wierstra, D., Foerster, A., Peters, J., & Schmidhuber, J. (2007). Solving deep memory POMDPs

with recurrent policy gradients. In Proceedings of the 17th International Conference on Artificial

Neural Networks (ICANN), Paris, France, 9-13 September volume 4668 of Lecture Notes in

Computer Science (pp. 697-706). doi: 10.1007/978-3-540-74690-4_71

Design of Experiments for Reinforcement Learning

Search WWH ::

Custom Search

Home