Civil Engineering Reference
In-Depth Information
Schmidhuber 1997 ) with recurrent neural networks has been shown to have con-
siderable promise over basic recurrent neural networks (Gers 2001 ; Bakker 2001 ,
2007 ; Wierstra et al. 2010 ). Other types of neural networks that have been used for
reinforcement learning include modular neural networks (Schraudolph et al. 1994 ),
cascading neural networks (Nissen 2007 ), self-organizing maps (Touzet 1997 ; Smith
2002 ; Tan et al. 2008 ; Montazeri et al. 2011 ; Osana 2011 ), and explanation-based
neural networks (EBNN) (Mitchell and Thrun 1992 ; Thrun 1995 ).
In spite of the implementation challenges and the lack of convergence proofs,
neural networks are still a commonly used function approximator for applications
of reinforcement learning. A large proportion of applications are to benchmark
or toy domains, such as the inverted pendulum problem (Anderson 1987 ), sin-
gle and double cart-pole balancing (Igel 2003 ; van Hasselt and Wiering 2007 ;
Hans and Udluft 2010 ), the pole swing-up task (Gabel et al. 2011 ), keep-away
(Whiteson et al. 2010 ), the mountain car problem (Wiering and van Hasselt 2007 ;
Whiteson et al. 2010 ), and maze problems (Wiering and van Hasselt 2007 ). Neural
networks are a common representation for learning games due to their ability to gen-
eralize, and applications include games such as Tic-tac-toe (Wiering 1995 ; Konen and
Beielstein 2009 ; Gatti and Embrechts 2012 ), Chung-Toi (a variant of Tic-tac-toe)
(Gatti et al. 2011a , b ), checkers (Patist and Wiering 2004 ; Wiering et al. 2007 ), Chess
(Thrun 1995 ; Mannen and Wiering 2004 ; Wiering et al. 2007 ), Othello (van Eck and
van Wezel 2008 ), Go (Runarsson and Lucas 2005 ), and Backgrammon (Tesauro
1995 ; Wiering et al. 2007 ; Wiering 2010 ; Papahristou and Refanidis 2011 ). Finally,
the use of neural networks for reinforcement learning in the real-world (includ-
ing real-world conceptual problems) includes applications such as control problems
(Werbos 1989 ; Mitchell and Thrun 1992 ; Yamada 2011 ), stock price prediction (Lee
2001 ) and trading (Gorse 2011 ), product delivery and distribution (Proper and Tade-
palli 2006 ), resource allocation (Tesauro et al. 2007 ), jobshop scheduling (Gabel and
Riedmiller 2007 ), and technical process control (Hafner and Riedmiller 2011 ).
2.2.3
Learning Algorithms
The goal of learning algorithms in reinforcement learning is essentially to allow the
agent to learn the dynamics of the environment so that an optimal set of actions may
be selected to achieve a goal or obtain the greatest total reward. While numerous con-
ceptual approaches, and thus numerous learning algorithms, have been developed for
learning this environment-action mapping (see Sutton and Barto ( 1998 ); Szepesvári
( 2010 ); Powell and Ma ( 2011 ) for comprehensive reviews of learning algorithms), in
this work we focus on value function learning methods. In these methods, the agent
attempts to learn a value function that approximates the value (i.e., utility) of states,
though we briefly mention other reinforcement learning algorithms that have been
developed.
The representation (i.e., agent, defined in Sect. 2.2.2 ) serves two purposes in
reinforcement learning. The first purpose is to learn about the dynamics of the domain
and how the selection of actions relate to feedback delivered by the domain. In other
Search WWH ::




Custom Search