Civil Engineering Reference
In-Depth Information
Tsitsiklis, J. N. & Roy, B. V. (1997). An analysis of temporal-difference learning with function
approximation. IEEE Transactions on Automatic Control , 42(5), 674-690.
van Eck, N. J. & van Wezel, M. (2008). Application of reinforcement learning to the game of
othello. Computers & Operations Research , 35(6), 1999-2017.
van Hasselt, H. & Wiering, M. A. (2007). Reinforcement learning in continuous action
spaces. In Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Program-
ming and Reinforcement Learning (ADPRL), Honolulu, HI, 1-5 April (pp. 272-279). Retrieved
from http://webdocs.cs.ualberta.ca/ ~
vanhasse/ papers/Reinforcement_Learning_in_ Continu-
ous_Action_Spaces.pdf
van Seijen, H., Whiteson, S., van Hasselt, H., & Wiering, M. (2011). Exploiting best-match
equations for efficient reinforcement learning. Journal of Machine Learning Research , 12(Jun),
2045-2094.
Veness, J., Silver, D., Uther, W., & Blair, A. (2009). Bootstrapping from game tree search. In Bengio,
Y., Schuurmans, D., Lafferty, J. D., Williams, C. K. I., & Culotta, A. (Eds.), Advances in Neural
Information Processing Systems 22 (pp. 1937-1945). Red Hook, NY: Curran Associates, Inc.
Watkins, C. J. C. H. (1989). Learning from delayed rewards . Unpublished PhD dissertation, King's
College, Cambridge, England.
Watkins, C. J. C. H. & Dayan, P. (1992). Q-learning. Machine Learning , 8(3-4), 279-292.
Werbos, P. J. (1974). Beyond regression: New tools for prediction and analysis in the behavioural
sciences . Unpublished PhD dissertation, Harvard University, Cambridge, MA.
Werbos, P. J. (1989). Backpropagation and neurocontrol: A review and prospectus. In Proceedings
of the International Joint Conference on Neural Networks (IJCNN), Washington, D.C., 18-22
June (pp. 209-216). doi: 10.1109/IJCNN. 1989.118583
Whiteson, S. & Stone, P. (2006). Evolutionary function approximation for reinforcement learning.
Machine Learning Research , 7, 877-917.
Whiteson, S., Tanner, B., Taylor, M. E., & Stone, P. (2009). Generalized domains for empirical
evaluations in reinforcement learning. In Proceedings of the 26th International Conference on
Machine Learning: Workshop on Evaluation Methods for Machine Learning, Montreal, Canada,
14-18 June . Retrieved from http://www.site.uottawa.ca/ICML09WS/papers/w8.pdf
Whiteson, S., Taylor, M. E., & Stone, P. (2010). Critical factors in the empirical performance of tem-
poral difference and evolutionary methods for reinforcement learning. Journal of Autonomous
Agents and Multi-Agent Systems , 21(1), 1-35.
Whiteson, S., Tanner, B., Taylor, M. E., & Stone, P. (2011). Protecting against evaluation overfit-
ting in empirical reinforcement learning. In Proceedings of the IEEE Symposium on Adaptive
Dynamic Programming and Reinforcement Learning (ADPRL), Paris, France, 11-15 April
(pp. 120-127). doi: 10.1109/ ADPRL.2011.5967363
Wiering, M. A. (1995). TD learning of game evaluation functions with hierarchical neural architec-
tures . Unpublished masters thesis, Department of Computer Science, University of Amsterdam,
Amsterdam, Netherlands.
Wiering, M. A. (2010). Self-play and using an expert to learn to play backgammon with temporal
difference learning. Journal of Intelligent Learning Systems & Applications , 2(2), 57-68.
Wiering, M. A. & van Hasselt, H. (2007). Two novel on-policy reinforcement learning algorithms
based on TD( ʻ )-methods. In Proceedings of the IEEE International Symposium on Adap-
tive Dynamic Programming and Reinforcement Learning (ADPRL), Honolulu, HI, 1-5 April
(pp. 280-287). doi: 10.1109/ADPRL.2007.368200
Wiering, M. A. & van Hasselt, H. (2008). Ensemble algorithms in reinforcement learning. IEEE
Transactions on Systems, Man, and Cybernetics , 38(4), 930-936.
Wiering, M. A., Patist, J. P., & Mannen, H. (2007). Learning to play board games using
temporal difference methods (Technical Report UU-CS-2005-048, Institute of Informa-
tion and Computing Sciences, Utrecht University). Retrieved from http://www.ai.rug.nl/ ~
mwiering/GROUP/ARTICLES/learning_games_TR.pdf.
Wierstra, D., Foerster, A., Peters, J., & Schmidhuber, J. (2007). Solving deep memory POMDPs
with recurrent policy gradients. In Proceedings of the 17th International Conference on Artificial
Neural Networks (ICANN), Paris, France, 9-13 September volume 4668 of Lecture Notes in
Computer Science (pp. 697-706). doi: 10.1007/978-3-540-74690-4_71
Search WWH ::




Custom Search