Reinforcement Learning - Design of Experiments for Reinforcement Learning

Civil Engineering Reference

In-Depth Information

Schmidhuber 1997 ) with recurrent neural networks has been shown to have con-

siderable promise over basic recurrent neural networks (Gers 2001 ; Bakker 2001 ,

2007 ; Wierstra et al. 2010 ). Other types of neural networks that have been used for

reinforcement learning include modular neural networks (Schraudolph et al. 1994 ),

cascading neural networks (Nissen 2007 ), self-organizing maps (Touzet 1997 ; Smith

2002 ; Tan et al. 2008 ; Montazeri et al. 2011 ; Osana 2011 ), and explanation-based

neural networks (EBNN) (Mitchell and Thrun 1992 ; Thrun 1995 ).

In spite of the implementation challenges and the lack of convergence proofs,

neural networks are still a commonly used function approximator for applications

of reinforcement learning. A large proportion of applications are to benchmark

or toy domains, such as the inverted pendulum problem (Anderson 1987 ), sin-

gle and double cart-pole balancing (Igel 2003 ; van Hasselt and Wiering 2007 ;

Hans and Udluft 2010 ), the pole swing-up task (Gabel et al. 2011 ), keep-away

(Whiteson et al. 2010 ), the mountain car problem (Wiering and van Hasselt 2007 ;

Whiteson et al. 2010 ), and maze problems (Wiering and van Hasselt 2007 ). Neural

networks are a common representation for learning games due to their ability to gen-

eralize, and applications include games such as Tic-tac-toe (Wiering 1995 ; Konen and

Beielstein 2009 ; Gatti and Embrechts 2012 ), Chung-Toi (a variant of Tic-tac-toe)

(Gatti et al. 2011a , b ), checkers (Patist and Wiering 2004 ; Wiering et al. 2007 ), Chess

(Thrun 1995 ; Mannen and Wiering 2004 ; Wiering et al. 2007 ), Othello (van Eck and

van Wezel 2008 ), Go (Runarsson and Lucas 2005 ), and Backgrammon (Tesauro

1995 ; Wiering et al. 2007 ; Wiering 2010 ; Papahristou and Refanidis 2011 ). Finally,

the use of neural networks for reinforcement learning in the real-world (includ-

ing real-world conceptual problems) includes applications such as control problems

(Werbos 1989 ; Mitchell and Thrun 1992 ; Yamada 2011 ), stock price prediction (Lee

2001 ) and trading (Gorse 2011 ), product delivery and distribution (Proper and Tade-

palli 2006 ), resource allocation (Tesauro et al. 2007 ), jobshop scheduling (Gabel and

Riedmiller 2007 ), and technical process control (Hafner and Riedmiller 2011 ).

2.2.3

Learning Algorithms

The goal of learning algorithms in reinforcement learning is essentially to allow the

agent to learn the dynamics of the environment so that an optimal set of actions may

be selected to achieve a goal or obtain the greatest total reward. While numerous con-

ceptual approaches, and thus numerous learning algorithms, have been developed for

learning this environment-action mapping (see Sutton and Barto ( 1998 ); Szepesvári

( 2010 ); Powell and Ma ( 2011 ) for comprehensive reviews of learning algorithms), in

this work we focus on value function learning methods. In these methods, the agent

attempts to learn a value function that approximates the value (i.e., utility) of states,

though we briefly mention other reinforcement learning algorithms that have been

developed.

The representation (i.e., agent, defined in Sect. 2.2.2 ) serves two purposes in

reinforcement learning. The first purpose is to learn about the dynamics of the domain

and how the selection of actions relate to feedback delivered by the domain. In other

Design of Experiments for Reinforcement Learning

Search WWH ::

Custom Search

Home