Civil Engineering Reference
In-Depth Information
by Kalyanakrishnan and Stone (
2007
) showed that experience replay effectively in-
crease performance, the implementation by Lin (
1992
) and van Seijen et al. (
2011
)
showed that experience replay increased the speed of learning, though the maximal
performance was similar to that when not using experience replay.
Another heuristic, called database games, consists of having the agent merely
observe games that have been played by human or computer players and that are
stored in a database (Tesauro
1995
; Thrun
1995
; Patist and Wiering
2004
; Mannen
and Wiering
2004
; Wiering et al.
2007
). In this case, the agent does not select
actions as in traditional reinforcement learning, but instead the agent learns from
predetermined actions. The level of play of the database games can vary, but most
often high-level games played by expert human players are used.
A closely related heuristic to learning a problem is that of transfer learning (Taylor
and Stone
2009
) or relational learning (Torrey
2009
). Recall that in reinforcement
learning, the agent generally begins with no knowledge about the problem domain.
In transfer and relational learning, knowledge that has been learned about one task
is utilized to improve the learning process and efficiency in another, related task.
A 'related' task can take different forms, some of which include sharing features
(Konidaris et al.
2012
), altering the allowable actions, altering the reward structure,
or generalizing the applicability. This approach is also very similar to that of inductive
learning from an artificial intelligence perspective (Michalski
1983
), for which there
has been some work that leverages this learning approach to develop agents that learn
provably optimal solutions (Schmidhuber
2005
,
2006
).
The use of specific domain or expert knowledge has also be used to im-
prove learning efficacy. These methods exploit domain information (Hoffmann and
Freier
1996
), modify the representation (Schraudolph et al.
1994
), or use an expertly-
contrived set of state features or state encoding scheme (Tesauro
1995
; Ghory
2004
;
Konen and Beielstein
2008
; Silver et al.
2012
). An alternative approach to using
an explicit set of state features is to use features that are essentially 'discovered' to
be useful based on spectral analysis of the agent's empirical state transition graph
(Mahadevan and Maggioni
2007
). Domain information has also been exploited by
partitioning the state space into a small number of groups, where the states within
each group have similar values, but where the groups themselves represent relatively
unique scenarios (Wiering
1995
). An effect of this approach is that the entire, and
potentially discontinuous, state space is partitioned into groups such that each group
has a smooth and continuous state subspace. A somewhat related method is to use
a hierarchical approach which also essentially results in a partitioning of the state
space, but where subgoals are used to achieve a single overarching goal (Bakker and
Schmidhuber
2004
;Simsek and Barto
2004
).
2.3.1.1
Effectors of Reinforcement Learning Performance
Despite the widely varying results of reinforcement learning in a variety of domains
and circumstances, there is relatively little work explicitly investigating exactly what
Search WWH ::
Custom Search