Civil Engineering Reference
In-Depth Information
in some circumstances, and it may be useful to explore the subregions that had the
best performance (lowest number of time steps, i.e., subregions 12, and 35). Another
consistency across the convergent subregions was the influence of ʳ , which was
significantly greater than all other parameters in most cases. Beyond the first-order
influence of ʳ , however, we found that the two-way interaction structure between
the other parameters are quite different across the subregions, indicating that there
are some differences in the shape or form of the response surfaces.
The TBU problem used in this work initialized the state of the truck to within a spe-
cific region and orientation, and it allowed for somewhat loose tolerances to achieve
the goal location and orientation relative to a real-world implementation. However,
we believe that the problem characteristics used herein is a good start to learning
the TBU. The strategies learned from within any of the convergent subregions could
then be used in a subsequent training scheme that is aimed at either generalizing the
initial conditions of the truck, reducing the goal tolerances, or both. Such a sequential
training scheme that goes beyond simply using only the TD( ʻ ) algorithm is regarded
as a heuristic. Although this is an interesting strategy for improving reinforcement
learning, and will likely be explored in future work, it is beyond the scope of the
current work.
References
Gatti, C. J. & Embrechts, M. J. (2014). An application of the temporal difference algorithm to
the truck backer-upper problem. In Proceedings of the 22 nd European Symposium on Artifi-
cial Neural Networks, Computational Intelligence and Machine Learning (ESANN) , Bruges,
Belgium, 23-25 April. Bruges, Belgium: ESANN.
Gatti, C. J., Embrechts, M. J., & Linton, J. D. (2013). An empirical analysis of reinforcement
learning using design of experiments. In Proceedings of the 21 st European Symposium on Ar-
tificial Neural Networks, Computational Intelligence and Machine Learning (ESANN) , Bruges,
Belgium, 24-26 April (pp. 221-226). Bruges, Belgium: ESANN.
Loeppky, J. L., Sacks, J., & Welch, W. J. (2009). Choosing the sample size of a computer experiment:
A practical guide. Technometrics , 51(4), 366-376.
Nguyen, D. & Widrow, B. (1990a). Neural networks for self-learning control systems. IEEE Control
Systems Magazine , 10(3), 18-23.
Nguyen, D. & Widrow, B. (1990b). The truck backer-upper: An example of self-learning in neural
networks. In Miller, W. T., Sutton, R. S., & Werbos, P. J. (Eds.), Neural Networks for Control .
Cambridge, MA: MIT Press.
Patist, J. P. & Wiering, M. (2004). Learning to play draughts using temporal difference learning
with neural networks and databases. In Proceedings of the 13th Belgian-Dutch Conference
on Machine Learning, Brussels, Belgium, 8-9 January (pp. 87-94). doi: 10.1007/978-3-540-
88190-2_13
Schoenauer, M. & Ronald, E. (1994). Neuro-genetic truck backer-upper controller. In Proceedings
of the IEEE Conference on Computational Intelligence, Orlando, FL, 27 June-2 July (Vol. 2,
pp. 720-723). doi: 10.1109/ICEC.1994.349969
Tesauro, G. (1992). Practical issues in temporal difference learning. Machine Learning , 8(3-4),
257-277.
Thrun, S. (1995). Learning to play the game of Chess. In Advances in Neural Information Processing
Systems 7 (pp. 1069-1076). Cambridge, MA: MIT Press.
Search WWH ::




Custom Search