Civil Engineering Reference
In-Depth Information
Tsitsiklis, J. N. & Roy, B. V. (1997). An analysis of temporal-difference learning with function
approximation.
IEEE Transactions on Automatic Control
, 42(5), 674-690.
van Eck, N. J. & van Wezel, M. (2008). Application of reinforcement learning to the game of
othello.
Computers & Operations Research
, 35(6), 1999-2017.
van Hasselt, H. & Wiering, M. A. (2007). Reinforcement learning in continuous action
spaces. In
Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Program-
ming and Reinforcement Learning (ADPRL), Honolulu, HI, 1-5 April
(pp. 272-279). Retrieved
from http://webdocs.cs.ualberta.ca/
~
vanhasse/ papers/Reinforcement_Learning_in_ Continu-
ous_Action_Spaces.pdf
van Seijen, H., Whiteson, S., van Hasselt, H., & Wiering, M. (2011). Exploiting best-match
equations for efficient reinforcement learning.
Journal of Machine Learning Research
, 12(Jun),
2045-2094.
Veness, J., Silver, D., Uther, W., & Blair, A. (2009). Bootstrapping from game tree search. In Bengio,
Y., Schuurmans, D., Lafferty, J. D., Williams, C. K. I., & Culotta, A. (Eds.),
Advances in Neural
Information Processing Systems 22
(pp. 1937-1945). Red Hook, NY: Curran Associates, Inc.
Watkins, C. J. C. H. (1989).
Learning from delayed rewards
. Unpublished PhD dissertation, King's
College, Cambridge, England.
Watkins, C. J. C. H. & Dayan, P. (1992). Q-learning.
Machine Learning
, 8(3-4), 279-292.
Werbos, P. J. (1974).
Beyond regression: New tools for prediction and analysis in the behavioural
sciences
. Unpublished PhD dissertation, Harvard University, Cambridge, MA.
Werbos, P. J. (1989). Backpropagation and neurocontrol: A review and prospectus. In
Proceedings
of the International Joint Conference on Neural Networks (IJCNN), Washington, D.C., 18-22
June
(pp. 209-216). doi: 10.1109/IJCNN. 1989.118583
Whiteson, S. & Stone, P. (2006). Evolutionary function approximation for reinforcement learning.
Machine Learning Research
, 7, 877-917.
Whiteson, S., Tanner, B., Taylor, M. E., & Stone, P. (2009). Generalized domains for empirical
evaluations in reinforcement learning. In
Proceedings of the 26th International Conference on
Machine Learning: Workshop on Evaluation Methods for Machine Learning, Montreal, Canada,
14-18 June
. Retrieved from http://www.site.uottawa.ca/ICML09WS/papers/w8.pdf
Whiteson, S., Taylor, M. E., & Stone, P. (2010). Critical factors in the empirical performance of tem-
poral difference and evolutionary methods for reinforcement learning.
Journal of Autonomous
Agents and Multi-Agent Systems
, 21(1), 1-35.
Whiteson, S., Tanner, B., Taylor, M. E., & Stone, P. (2011). Protecting against evaluation overfit-
ting in empirical reinforcement learning. In
Proceedings of the IEEE Symposium on Adaptive
Dynamic Programming and Reinforcement Learning (ADPRL), Paris, France, 11-15 April
(pp. 120-127). doi: 10.1109/ ADPRL.2011.5967363
Wiering, M. A. (1995).
TD learning of game evaluation functions with hierarchical neural architec-
tures
. Unpublished masters thesis, Department of Computer Science, University of Amsterdam,
Amsterdam, Netherlands.
Wiering, M. A. (2010). Self-play and using an expert to learn to play backgammon with temporal
difference learning.
Journal of Intelligent Learning Systems & Applications
, 2(2), 57-68.
Wiering, M. A. & van Hasselt, H. (2007). Two novel on-policy reinforcement learning algorithms
based on TD(
ʻ
)-methods. In
Proceedings of the IEEE International Symposium on Adap-
tive Dynamic Programming and Reinforcement Learning (ADPRL), Honolulu, HI, 1-5 April
(pp. 280-287). doi: 10.1109/ADPRL.2007.368200
Wiering, M. A. & van Hasselt, H. (2008). Ensemble algorithms in reinforcement learning.
IEEE
Transactions on Systems, Man, and Cybernetics
, 38(4), 930-936.
Wiering, M. A., Patist, J. P., & Mannen, H. (2007).
Learning to play board games using
temporal difference methods
(Technical Report UU-CS-2005-048, Institute of Informa-
tion and Computing Sciences, Utrecht University). Retrieved from
http://www.ai.rug.nl/
~
Wierstra, D., Foerster, A., Peters, J., & Schmidhuber, J. (2007). Solving deep memory POMDPs
with recurrent policy gradients. In
Proceedings of the 17th International Conference on Artificial
Neural Networks (ICANN), Paris, France, 9-13 September
volume 4668 of
Lecture Notes in
Computer Science
(pp. 697-706). doi: 10.1007/978-3-540-74690-4_71
Search WWH ::
Custom Search