Civil Engineering Reference
In-Depth Information
approach can be extended and modified to have multiple outputs for multiple ac-
tions, or to have multiple neural networks to represented either different actions or
different regions of the state space (Wiering 1995 ; Kaelbling et al. 1996 ; Lazaric
2008 ; Kalyanakrishnan and Stone 2009 ). Out of these rather rudimentary approaches,
none have been found to be superior for all applications, though many implemen-
tations have the common quality that they use networks with a single hidden layer
(Ghory 2004 ).
The use of neural networks for reinforcement learning is attractive for a number
of reasons. The first is that, unlike look-up tables but similar to linear methods, they
can generalize state values to states that have not been explicitly visited. In domains
with smooth and well-behaved value functions, this can be extremely useful and can
reduce the amount of training required to learn the domain. The second reason is
that neural networks are parameterized by a relatively small number of parameters,
especially when compared to the size of the state space of some domains. Finally,
neural networks do not necessarily require the use of carefully hand-crafter state
features as inputs. Rather, the hidden layer(s) of the neural network is(are) able to
derive implicit features that are deemed to be useful (Konen and Beielstein 2009 ),
though these derived features cannot often be interpreted (Günther 2008 ).
Despite these attractive properties of neural networks, neural networks are not
the dominant representation for most of the reinforcement learning community for
a number of reasons. Replacing a look-up table with a complex function approxi-
mator, such as a neural network, is not trivial and is viewed by some as not being
robust (Boyan and Moore 1995 ). The convergence proofs of reinforcement learn-
ing algorithms for linear methods have not been extended to non-linear function
approximators (Tsitsiklis and Roy 1996 , 1997 ), and these learning algorithms may
find suboptimal solutions or may diverge (Boyan and Moore 1995 ; Bertsekas and
Tsitsiklis 1996 ). The primary reason for this is that nonlinear methods tend to exag-
gerate small changes in the target function, and this exaggeration causes the value
iteration algorithm to become unstable (Thrun and Schwartz 1993 ; Gordon 1995 ).
It has also be suggested that the use of state discounting in reinforcement learn-
ing can potentially lead to instability with value iteration algorithms (Thrun and
Schwartz 1993 ).
The successful use of a neural network is dependent on the properties of the un-
derlying value function as well as on the ability of the neural network to approximate
the value function (Thrun and Schwartz 1993 ; Dietterich 2000 ). Though the abil-
ity of a neural network to generalize to unvisited states is attractive, this property
can be unfavorable when the value function is not smooth or is not well-behaved,
which may require longer training or could result in the learning algorithm diverging
(Riedmiller 2005 ). Such situations may arise when the numerical representations of
the state vectors are close together, yet their state values are quite different (Dayan
1993 ; Mahadevan and Maggioni 2007 ; Osentoski 2009 ). It has been suggested that
the dynamic capability of a neural network is dependent on its architecture (Loone
and Irwin 2001 ; Igel 2003 ), and thus a poorly chosen network architecture may
either perform poorly or may diverge during training (Hans and Udluft 2010 ), and
others claim that the neural network architecture must be tailored for the application
(Günther 2008 ).
Search WWH ::




Custom Search