Reinforcement Learning - Design of Experiments for Reinforcement Learning

Civil Engineering Reference

In-Depth Information

of different learning algorithms on generalized Gridworld problems that are parame-

terized by the size of the domain, state transition stochasticity, function approximator

coverage, and state observability. A series of basic parameter studies found that do-

main characteristics affect which type of learning algorithm performs best, and that

there may be interactions between domain characteristics and algorithm parameters,

though these results were not statistically analyzed. Whiteson et al. ( 2011 ) uses gen-

eralized benchmark problems (mountain car, acrobot, and puddle world) to evaluate

state space coverage methods for function approximators, and this work found that

adaptive tile coding could perform well over all of the test domains, whereas the

performance of a general tile coding scheme was worse and more variable.

References

Albus, J. S. (1975). A new approach to manipulator control: The cerebellar model articulation

controller (CMAC). Journal of Dynamic Systems, Measurement, and Control , 97(3), 220-227.

Aldous, D. (1983). Random walks on finite groups and rapidly mixing Markov chains. In Seminar

on Probability XVII, Lecture Notes in Mathematics Volume 986 (pp. 243-297). Berlin: Springer.

Anderson, C. W. (1987). Strategy learning with multilayer connectionist representations. In Langley,

P. (Ed.), Proceedings of the 4th International Workshop on Machine Learning, Irvine, CA, 22-25

June (pp. 103-114). San Mateo, CA: Morgan Kaufmann.

Atkeson, C. G. & Santamaría, J. C. (1997). A comparison of direct and model-based re-

inforcement learning. In Proceedings of the IEEE International Conference on Robotics

and Automation (ICRA), Albequerque,

NM, 20-25 April (Vol. 4,

pp. 3557-3564).

doi:

10.1109/ROBOT.1997.606886

Atkeson, C. G., Moore, A. W., & Schaal, S. (1997). Locally weighted learning. Artificial Intelligence

Review , 11(1-5), 11-73.

Archibald, T. W., McKinnon, K. I. M., & Thomas, L. C. (1995). On the generation of Markov

decision processes. Journal of the Operational Research Society , 46(3), 354-361.

Awate, Y. P. (2009). Policy-gradient based actor-critic algorithms. In Proceedings of the Global

Congress on Intelligent Systems (GCIS), Xiamen, China, 19-21 May (pp. 505-509). doi:

10.1109/GCIS.2009.372

Bagnell,

J. A. & Schneider,

J. G. (2001).

Autonomous helicopter control using reinforce-

ment learning policy search methods.

In Proceedings of the International Conference

on Robotics and Automation,

Seoul,

Korea,

21-26 May (Vol. 2,

pp. 1615-1620).

doi:

10.1109/ROBOT.2001.932842

Baird, L. (1995). Residual algorithms: Reinforcement learning with function approximation. In

Prieditis, A. and Russell, S. (Eds.) Proceedings of the 12th International Conference on Ma-

chine Learning (ICML), Tahoe City, CA, 9-12 July (pp. 30-37). San Francisco, CA: Morgan

Kaufmann.

Baird,

L. C. (1999).

Reinforcement learning through gradient descent .

Unpublished PhD

dissertation, Carnegie Mellon University, Pittsburgh, PA.

Bakker, B. (2001). Reinforcement learning with LSTM in non-Markovian tasks with longterm

dependencies (Technical Report, Department of Psychology, Leiden University). Retrieved

from http://staff.science.uva.nl/ ~ bram/RLLSTM_ TR.pdf.

Bakker, B. (2007). Reinforcement learning by backpropagation through an LSTM model/critic.

In IEEE International Symposium on Approximate Dynamic Programming and Reinforcement

Learning (ADPRL), Honolulu, HI, 1-5 April (pp. 127-134). doi: 10.1109/ADPRL.2007.368179

Bakker, B. & Schmidhuber, J. (2004). Hierarchical reinforcement learning based on subgoal dis-

covery and subpolicy specialization. In Groen, F., Amato, N., Bonarini, A., Yoshida, E., &

Design of Experiments for Reinforcement Learning

Search WWH ::

Custom Search

Home