Civil Engineering Reference
In-Depth Information
of different learning algorithms on generalized Gridworld problems that are parame-
terized by the size of the domain, state transition stochasticity, function approximator
coverage, and state observability. A series of basic parameter studies found that do-
main characteristics affect which type of learning algorithm performs best, and that
there may be interactions between domain characteristics and algorithm parameters,
though these results were not statistically analyzed. Whiteson et al. ( 2011 ) uses gen-
eralized benchmark problems (mountain car, acrobot, and puddle world) to evaluate
state space coverage methods for function approximators, and this work found that
adaptive tile coding could perform well over all of the test domains, whereas the
performance of a general tile coding scheme was worse and more variable.
References
Albus, J. S. (1975). A new approach to manipulator control: The cerebellar model articulation
controller (CMAC). Journal of Dynamic Systems, Measurement, and Control , 97(3), 220-227.
Aldous, D. (1983). Random walks on finite groups and rapidly mixing Markov chains. In Seminar
on Probability XVII, Lecture Notes in Mathematics Volume 986 (pp. 243-297). Berlin: Springer.
Anderson, C. W. (1987). Strategy learning with multilayer connectionist representations. In Langley,
P. (Ed.), Proceedings of the 4th International Workshop on Machine Learning, Irvine, CA, 22-25
June (pp. 103-114). San Mateo, CA: Morgan Kaufmann.
Atkeson, C. G. & SantamarĂ­a, J. C. (1997). A comparison of direct and model-based re-
inforcement learning. In Proceedings of the IEEE International Conference on Robotics
and Automation (ICRA), Albequerque,
NM, 20-25 April (Vol. 4,
pp. 3557-3564).
doi:
10.1109/ROBOT.1997.606886
Atkeson, C. G., Moore, A. W., & Schaal, S. (1997). Locally weighted learning. Artificial Intelligence
Review , 11(1-5), 11-73.
Archibald, T. W., McKinnon, K. I. M., & Thomas, L. C. (1995). On the generation of Markov
decision processes. Journal of the Operational Research Society , 46(3), 354-361.
Awate, Y. P. (2009). Policy-gradient based actor-critic algorithms. In Proceedings of the Global
Congress on Intelligent Systems (GCIS), Xiamen, China, 19-21 May (pp. 505-509). doi:
10.1109/GCIS.2009.372
Bagnell,
J. A. & Schneider,
J. G. (2001).
Autonomous helicopter control using reinforce-
ment learning policy search methods.
In Proceedings of the International Conference
on Robotics and Automation,
Seoul,
Korea,
21-26 May (Vol. 2,
pp. 1615-1620).
doi:
10.1109/ROBOT.2001.932842
Baird, L. (1995). Residual algorithms: Reinforcement learning with function approximation. In
Prieditis, A. and Russell, S. (Eds.) Proceedings of the 12th International Conference on Ma-
chine Learning (ICML), Tahoe City, CA, 9-12 July (pp. 30-37). San Francisco, CA: Morgan
Kaufmann.
Baird,
L. C. (1999).
Reinforcement learning through gradient descent .
Unpublished PhD
dissertation, Carnegie Mellon University, Pittsburgh, PA.
Bakker, B. (2001). Reinforcement learning with LSTM in non-Markovian tasks with longterm
dependencies (Technical Report, Department of Psychology, Leiden University). Retrieved
from http://staff.science.uva.nl/ ~ bram/RLLSTM_ TR.pdf.
Bakker, B. (2007). Reinforcement learning by backpropagation through an LSTM model/critic.
In IEEE International Symposium on Approximate Dynamic Programming and Reinforcement
Learning (ADPRL), Honolulu, HI, 1-5 April (pp. 127-134). doi: 10.1109/ADPRL.2007.368179
Bakker, B. & Schmidhuber, J. (2004). Hierarchical reinforcement learning based on subgoal dis-
covery and subpolicy specialization. In Groen, F., Amato, N., Bonarini, A., Yoshida, E., &
Search WWH ::




Custom Search