Civil Engineering Reference
In-Depth Information
These problems are simpler than their real-world and end-goal counterparts, but the
insights that can be gained from them can be valuable in achieving the end-goal.
For example, the multi-armed bandit problem is where an agent is attempts to maxi-
mize its reward from multiple single-armed bandits (i.e., slot machines), where each
bandit has a different expected reward. Learning how to maximize performance in
this domain requires both exploitation of knowledge and exploration of potentially
better actions. This problem scenario may be used to gain an understanding of the
dynamics of a problem, and such knowledge has been used to aid in the application of
similar methods to problems in energy management or robotics (Galichet et al. 2013 ;
Karnin et al. 2013 ).
2.1.4
Generalized Domains
There is another class of domains to which reinforcement learning has been applied
that deserves attention. These domains are not based on any physical, real-world, or
game-like domain. Rather they could instead be considered generalized domains that
are either completely abstract environments or are environments that have parameter-
ized characteristics. The term abstract in this sense refers to an environment which
lacks a physical representation and is merely a numerical representation with defined
properties, dynamics, and constraints. While all of the environments mentioned in
this section have not been used with reinforcement learning specifically, they have
been used with some form of sequential decision making-related algorithm.
One of the original purely abstract generalized domains was for infinite horizon
problems developed by Archibald et al. ( 1995 ) out of a need for a more flexible
and widely available platform for assessing the characteristics of Markov decision
processes. This domain is highly parameterized, using 14 variables that determine
the dynamics of the domain; one of the novelties of this work was that it allowed for
the control of the mixing speed of the problem (Aldous 1983 ), which is related to the
relative importance of the starting state of the process. However, this type of domain
is based on enumerated states that are used with look-up table representations, rather
than state vectors that are commonly used in present day problems.
Garnet (Generalized Average Reward Non-stationary Environment Testbed) prob-
lems are a similar type of generalized abstract domain that have been developed more
recently, though these domains are for episodic learning problems (Bhatnagar et al.
2009 ; Castro and Mannor 2010 ; Awate 2009 ), as is considered in the present work.
The domains generated by Bhatnagar et al. ( 2009 ) are based on a simpler, five param-
eter domain that is defined by: (1) the number of states, (2) the number of actions,
(3) the branching factor, (4) the standard deviation of the reward(s), and (5) the
stationarity. Two additional parameters are the dimensionality and the number of
non-zero components of the state vectors. The domains created by Awate ( 2009 ) and
Castro and Mannor ( 2010 ) omit the stationarity parameter, but otherwise use the
same domain construction method.
Search WWH ::




Custom Search