Reinforcement Learning - Design of Experiments for Reinforcement Learning

Civil Engineering Reference

In-Depth Information

Powell, W. B. (2008). What you should know about approximate dynamic programming. Naval

Research Logistics , 56(3), 239-249.

Powell, W. B. & Ma, J. (2011). A review of stochastic algorithms with continuous value function

approximation and some new approximate policy iteration algorithms for multidimensional

continuous applications. Journal of Control Theory and Applications , 9(3), 336-352.

Proper, S. & Tadepalli, P. (2006). Scaling model-based average-reward reinforcement learning for

product delivery. In Machine Learning: European Conference on Machine Learning (ECML

2006), Berlin, Germany, 18-22 September (pp. 735-742). doi: 10.1007/11871842_74

Rescorla, R. A. & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the

effectiveness of reinforcement and nonreinforcement. In Black, A. H. & Prokasy, W. F. (Eds.),

Classical Conditioning II: Current research and theory (pp. 64-99). New York, NY: Appleton-

Century-Crofts.

Riedmiller, M. (2005). Neural fitted Q iteration—First experiences with a data efficient neural

reinforcement learning method. In Gama, J., Camacho, R., Brazdil, P. B., Jorge, A. M., &

Torgo, L. (Eds.), Proceedings of the 16th European Conference on Machine Learning (ECML

2005), Porto, Portugal, 3-7 October (pp. 317-328). doi: 10.1007/11564096_32

Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal representation by error

propagation. In Rumelhart, D. E. & McClelland, J. L. (Eds.), Parallel Distributed Processing:

Exploration in the Microstructure of Cognition . Cambridge, MA: MIT Press.

Rummery, G. A. & Niranjan, M. (1994). On-line Q-learning using connectionist systems (Technical

Report CUED/F-INFENG/TR 166, Engineering Department, Cambridge University). Retrieved

from http://mi.eng.cam.ac.uk/reports/svr-ftp/auto-pdf/rummery_tr166.pdf

Runarsson, T. P. & Lucas, S. M. (2005). Co-evolution versus self-play temporal difference learn-

ing for acquiring position evaluation in small-board Go. IEEE Transactions on Evolutionary

Computing , 9(6), 628-640.

Schaeffer, J., Hlynka, M., & Jussila, V. (2001). Temporal difference learning applied to a high-

performance game-playing program. In Proceedings of the 17th International Joint Conference

on Artificial Intelligence (IJCAI), Seattle, WA, 4-10 August (Vol. 1, pp. 529-534). San Francisco,

CA: Morgan Kaufmann.

Schmidhuber, J. (2005). Completely self-referential optimal reinforcement learners. In Proceedings

of the International Conference on Artificial Neural Networks (ICANN), Warsaw, Poland, 11-15

September , volume 3697 of Lecture Notes in Computer Science (pp. 223-233). Berlin: Springer.

Schmidhuber, J. (2006). G odel machines: Fully self-referential optimal universal self-improvers.

In Goertzel, B. & Pennachin, C. (Eds.), Artificial General Intelligence (pp. 199-226). doi:

10.1007/11550907_36

Schraudolph, N. N., Dayan, P., & Sejnowski, T. J. (1994). Temporal difference learning of position

evaluation in the game of Go. In Cowan, J. D. & Alspector, G. T. J. (Eds.), Advances in Neural

Information Processing Systems 6 . San Francisco, CA: Morgan Kaufmann.

Silver, D., Sutton, R. S., & Müller, M. (2012). Temporal-difference search in computer Go. Machine

Learning , 87(2), 183-219.

Simsek, O. & Barto, A. G. (2004). Using relative novelty to identify useful temporal abstractions

in reinforcement learning. In Proceedings of the 21st International Conference on Machine

Learning, Banff, Alberta, Canada, 4-8 July (pp. 751-758). doi: 10.1145/1015330.1015353

Singh, S. P., Jaakkola, T., & Jordan, M. I. (1994). Learning without state-estimation in partially

observable Markovian decision processes. In Proceedings of the 11th International Conference

on Machine Learning (ICML), New Brunswick, NJ, 10-13 July (pp. 284-292). San Francisco,

CA: Morgan Kauffman.

Singh, S. P., Jaakkola, T., & Jordan, M. I. (1995). Reinforcement learning with soft state aggregation.

In Advances in Neural Information Processing Systems 7 (pp. 361-368). Cambridge, MA: MIT

Press.

Singh, S. P. & Sutton, R. S. (1996). Reinforcement learning with replacing eligibility traces. Machine

Learning , 22(1-3), 123-158.

Skelly, M. M. (2004). Hierarchical reinforcement learning with function approximation for adaptive

control . Unpublished PhD dissertation, Case Western Reserve University, Cleveland, OH.

Design of Experiments for Reinforcement Learning

Search WWH ::

Custom Search

Home