Civil Engineering Reference
In-Depth Information
Schmidhuber
1997
) with recurrent neural networks has been shown to have con-
siderable promise over basic recurrent neural networks (Gers
2001
; Bakker
2001
,
2007
; Wierstra et al.
2010
). Other types of neural networks that have been used for
reinforcement learning include modular neural networks (Schraudolph et al.
1994
),
cascading neural networks (Nissen
2007
), self-organizing maps (Touzet
1997
; Smith
2002
; Tan et al.
2008
; Montazeri et al.
2011
; Osana
2011
), and explanation-based
neural networks (EBNN) (Mitchell and Thrun
1992
; Thrun
1995
).
In spite of the implementation challenges and the lack of convergence proofs,
neural networks are still a commonly used function approximator for applications
of reinforcement learning. A large proportion of applications are to benchmark
or toy domains, such as the inverted pendulum problem (Anderson
1987
), sin-
gle and double cart-pole balancing (Igel
2003
; van Hasselt and Wiering
2007
;
Hans and Udluft
2010
), the pole swing-up task (Gabel et al.
2011
), keep-away
(Whiteson et al.
2010
), the mountain car problem (Wiering and van Hasselt
2007
;
Whiteson et al.
2010
), and maze problems (Wiering and van Hasselt
2007
). Neural
networks are a common representation for learning games due to their ability to gen-
eralize, and applications include games such as Tic-tac-toe (Wiering
1995
; Konen and
Beielstein
2009
; Gatti and Embrechts
2012
), Chung-Toi (a variant of Tic-tac-toe)
(Gatti et al.
2011a
,
b
), checkers (Patist and Wiering
2004
; Wiering et al.
2007
), Chess
(Thrun
1995
; Mannen and Wiering
2004
; Wiering et al.
2007
), Othello (van Eck and
van Wezel
2008
), Go (Runarsson and Lucas
2005
), and Backgrammon (Tesauro
1995
; Wiering et al.
2007
; Wiering
2010
; Papahristou and Refanidis
2011
). Finally,
the use of neural networks for reinforcement learning in the real-world (includ-
ing real-world conceptual problems) includes applications such as control problems
(Werbos
1989
; Mitchell and Thrun
1992
; Yamada
2011
), stock price prediction (Lee
2001
) and trading (Gorse
2011
), product delivery and distribution (Proper and Tade-
palli
2006
), resource allocation (Tesauro et al.
2007
), jobshop scheduling (Gabel and
Riedmiller
2007
), and technical process control (Hafner and Riedmiller
2011
).
2.2.3
Learning Algorithms
The goal of learning algorithms in reinforcement learning is essentially to allow the
agent to learn the dynamics of the environment so that an optimal set of actions may
be selected to achieve a goal or obtain the greatest total reward. While numerous con-
ceptual approaches, and thus numerous learning algorithms, have been developed for
learning this environment-action mapping (see Sutton and Barto (
1998
); Szepesvári
(
2010
); Powell and Ma (
2011
) for comprehensive reviews of learning algorithms), in
this work we focus on value function learning methods. In these methods, the agent
attempts to learn a value function that approximates the value (i.e., utility) of states,
though we briefly mention other reinforcement learning algorithms that have been
developed.
The representation (i.e., agent, defined in Sect.
2.2.2
) serves two purposes in
reinforcement learning. The first purpose is to learn about the dynamics of the domain
and how the selection of actions relate to feedback delivered by the domain. In other
Search WWH ::
Custom Search