Civil Engineering Reference
In-Depth Information
Powell, W. B. (2008). What you should know about approximate dynamic programming. Naval
Research Logistics , 56(3), 239-249.
Powell, W. B. & Ma, J. (2011). A review of stochastic algorithms with continuous value function
approximation and some new approximate policy iteration algorithms for multidimensional
continuous applications. Journal of Control Theory and Applications , 9(3), 336-352.
Proper, S. & Tadepalli, P. (2006). Scaling model-based average-reward reinforcement learning for
product delivery. In Machine Learning: European Conference on Machine Learning (ECML
2006), Berlin, Germany, 18-22 September (pp. 735-742). doi: 10.1007/11871842_74
Rescorla, R. A. & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the
effectiveness of reinforcement and nonreinforcement. In Black, A. H. & Prokasy, W. F. (Eds.),
Classical Conditioning II: Current research and theory (pp. 64-99). New York, NY: Appleton-
Century-Crofts.
Riedmiller, M. (2005). Neural fitted Q iteration—First experiences with a data efficient neural
reinforcement learning method. In Gama, J., Camacho, R., Brazdil, P. B., Jorge, A. M., &
Torgo, L. (Eds.), Proceedings of the 16th European Conference on Machine Learning (ECML
2005), Porto, Portugal, 3-7 October (pp. 317-328). doi: 10.1007/11564096_32
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal representation by error
propagation. In Rumelhart, D. E. & McClelland, J. L. (Eds.), Parallel Distributed Processing:
Exploration in the Microstructure of Cognition . Cambridge, MA: MIT Press.
Rummery, G. A. & Niranjan, M. (1994). On-line Q-learning using connectionist systems (Technical
Report CUED/F-INFENG/TR 166, Engineering Department, Cambridge University). Retrieved
from http://mi.eng.cam.ac.uk/reports/svr-ftp/auto-pdf/rummery_tr166.pdf
Runarsson, T. P. & Lucas, S. M. (2005). Co-evolution versus self-play temporal difference learn-
ing for acquiring position evaluation in small-board Go. IEEE Transactions on Evolutionary
Computing , 9(6), 628-640.
Schaeffer, J., Hlynka, M., & Jussila, V. (2001). Temporal difference learning applied to a high-
performance game-playing program. In Proceedings of the 17th International Joint Conference
on Artificial Intelligence (IJCAI), Seattle, WA, 4-10 August (Vol. 1, pp. 529-534). San Francisco,
CA: Morgan Kaufmann.
Schmidhuber, J. (2005). Completely self-referential optimal reinforcement learners. In Proceedings
of the International Conference on Artificial Neural Networks (ICANN), Warsaw, Poland, 11-15
September , volume 3697 of Lecture Notes in Computer Science (pp. 223-233). Berlin: Springer.
Schmidhuber, J. (2006). G odel machines: Fully self-referential optimal universal self-improvers.
In Goertzel, B. & Pennachin, C. (Eds.), Artificial General Intelligence (pp. 199-226). doi:
10.1007/11550907_36
Schraudolph, N. N., Dayan, P., & Sejnowski, T. J. (1994). Temporal difference learning of position
evaluation in the game of Go. In Cowan, J. D. & Alspector, G. T. J. (Eds.), Advances in Neural
Information Processing Systems 6 . San Francisco, CA: Morgan Kaufmann.
Silver, D., Sutton, R. S., & Müller, M. (2012). Temporal-difference search in computer Go. Machine
Learning , 87(2), 183-219.
Simsek, O. & Barto, A. G. (2004). Using relative novelty to identify useful temporal abstractions
in reinforcement learning. In Proceedings of the 21st International Conference on Machine
Learning, Banff, Alberta, Canada, 4-8 July (pp. 751-758). doi: 10.1145/1015330.1015353
Singh, S. P., Jaakkola, T., & Jordan, M. I. (1994). Learning without state-estimation in partially
observable Markovian decision processes. In Proceedings of the 11th International Conference
on Machine Learning (ICML), New Brunswick, NJ, 10-13 July (pp. 284-292). San Francisco,
CA: Morgan Kauffman.
Singh, S. P., Jaakkola, T., & Jordan, M. I. (1995). Reinforcement learning with soft state aggregation.
In Advances in Neural Information Processing Systems 7 (pp. 361-368). Cambridge, MA: MIT
Press.
Singh, S. P. & Sutton, R. S. (1996). Reinforcement learning with replacing eligibility traces. Machine
Learning , 22(1-3), 123-158.
Skelly, M. M. (2004). Hierarchical reinforcement learning with function approximation for adaptive
control . Unpublished PhD dissertation, Case Western Reserve University, Cleveland, OH.
Search WWH ::




Custom Search