Reinforcement Learning - Design of Experiments for Reinforcement Learning

Civil Engineering Reference

In-Depth Information

by Kalyanakrishnan and Stone ( 2007 ) showed that experience replay effectively in-

crease performance, the implementation by Lin ( 1992 ) and van Seijen et al. ( 2011 )

showed that experience replay increased the speed of learning, though the maximal

performance was similar to that when not using experience replay.

Another heuristic, called database games, consists of having the agent merely

observe games that have been played by human or computer players and that are

stored in a database (Tesauro 1995 ; Thrun 1995 ; Patist and Wiering 2004 ; Mannen

and Wiering 2004 ; Wiering et al. 2007 ). In this case, the agent does not select

actions as in traditional reinforcement learning, but instead the agent learns from

predetermined actions. The level of play of the database games can vary, but most

often high-level games played by expert human players are used.

A closely related heuristic to learning a problem is that of transfer learning (Taylor

and Stone 2009 ) or relational learning (Torrey 2009 ). Recall that in reinforcement

learning, the agent generally begins with no knowledge about the problem domain.

In transfer and relational learning, knowledge that has been learned about one task

is utilized to improve the learning process and efficiency in another, related task.

A 'related' task can take different forms, some of which include sharing features

(Konidaris et al. 2012 ), altering the allowable actions, altering the reward structure,

or generalizing the applicability. This approach is also very similar to that of inductive

learning from an artificial intelligence perspective (Michalski 1983 ), for which there

has been some work that leverages this learning approach to develop agents that learn

provably optimal solutions (Schmidhuber 2005 , 2006 ).

The use of specific domain or expert knowledge has also be used to im-

prove learning efficacy. These methods exploit domain information (Hoffmann and

Freier 1996 ), modify the representation (Schraudolph et al. 1994 ), or use an expertly-

contrived set of state features or state encoding scheme (Tesauro 1995 ; Ghory 2004 ;

Konen and Beielstein 2008 ; Silver et al. 2012 ). An alternative approach to using

an explicit set of state features is to use features that are essentially 'discovered' to

be useful based on spectral analysis of the agent's empirical state transition graph

(Mahadevan and Maggioni 2007 ). Domain information has also been exploited by

partitioning the state space into a small number of groups, where the states within

each group have similar values, but where the groups themselves represent relatively

unique scenarios (Wiering 1995 ). An effect of this approach is that the entire, and

potentially discontinuous, state space is partitioned into groups such that each group

has a smooth and continuous state subspace. A somewhat related method is to use

a hierarchical approach which also essentially results in a partitioning of the state

space, but where subgoals are used to achieve a single overarching goal (Bakker and

Schmidhuber 2004 ;Simsek and Barto 2004 ).

2.3.1.1

Effectors of Reinforcement Learning Performance

Despite the widely varying results of reinforcement learning in a variety of domains

and circumstances, there is relatively little work explicitly investigating exactly what

Search WWH ::

Custom Search

Home