Civil Engineering Reference
In-Depth Information
by Kalyanakrishnan and Stone ( 2007 ) showed that experience replay effectively in-
crease performance, the implementation by Lin ( 1992 ) and van Seijen et al. ( 2011 )
showed that experience replay increased the speed of learning, though the maximal
performance was similar to that when not using experience replay.
Another heuristic, called database games, consists of having the agent merely
observe games that have been played by human or computer players and that are
stored in a database (Tesauro 1995 ; Thrun 1995 ; Patist and Wiering 2004 ; Mannen
and Wiering 2004 ; Wiering et al. 2007 ). In this case, the agent does not select
actions as in traditional reinforcement learning, but instead the agent learns from
predetermined actions. The level of play of the database games can vary, but most
often high-level games played by expert human players are used.
A closely related heuristic to learning a problem is that of transfer learning (Taylor
and Stone 2009 ) or relational learning (Torrey 2009 ). Recall that in reinforcement
learning, the agent generally begins with no knowledge about the problem domain.
In transfer and relational learning, knowledge that has been learned about one task
is utilized to improve the learning process and efficiency in another, related task.
A 'related' task can take different forms, some of which include sharing features
(Konidaris et al. 2012 ), altering the allowable actions, altering the reward structure,
or generalizing the applicability. This approach is also very similar to that of inductive
learning from an artificial intelligence perspective (Michalski 1983 ), for which there
has been some work that leverages this learning approach to develop agents that learn
provably optimal solutions (Schmidhuber 2005 , 2006 ).
The use of specific domain or expert knowledge has also be used to im-
prove learning efficacy. These methods exploit domain information (Hoffmann and
Freier 1996 ), modify the representation (Schraudolph et al. 1994 ), or use an expertly-
contrived set of state features or state encoding scheme (Tesauro 1995 ; Ghory 2004 ;
Konen and Beielstein 2008 ; Silver et al. 2012 ). An alternative approach to using
an explicit set of state features is to use features that are essentially 'discovered' to
be useful based on spectral analysis of the agent's empirical state transition graph
(Mahadevan and Maggioni 2007 ). Domain information has also been exploited by
partitioning the state space into a small number of groups, where the states within
each group have similar values, but where the groups themselves represent relatively
unique scenarios (Wiering 1995 ). An effect of this approach is that the entire, and
potentially discontinuous, state space is partitioned into groups such that each group
has a smooth and continuous state subspace. A somewhat related method is to use
a hierarchical approach which also essentially results in a partitioning of the state
space, but where subgoals are used to achieve a single overarching goal (Bakker and
Schmidhuber 2004 ;Simsek and Barto 2004 ).
2.3.1.1
Effectors of Reinforcement Learning Performance
Despite the widely varying results of reinforcement learning in a variety of domains
and circumstances, there is relatively little work explicitly investigating exactly what
Search WWH ::




Custom Search