Civil Engineering Reference
In-Depth Information
Gridworld has also served as a base domain for a generalized domain by
Kalyanakrishnan and Stone ( 2009 , 2011 ). These works used a Gridworld-like param-
eterized domain to evaluate how value function and policy search learning algorithms
perform under specific domain variations. These parameterized domains were devel-
oped to assess the effects of a small and specific set environment characteristics on
the performance of certain learning algorithms, and thus the parameterization of the
domains was not very extensive and only relied on 4 parameters: (1) domain size, (2)
stochasticity, (3) reward distribution, and (4) observability. In the works cited, these
domains restrict the state space to only 48 states, making it possible to benchmark
the performance of the learning algorithms on each domain against an optimal action
set determined using dynamic programming.
These Gridworld-based domains have a number of attractive properties, includ-
ing: the smoothness of the value function due to the connectivity of the states; the
2-dimensional representation of the state space allowing for 3-dimensional visual-
ization of the value function; and the small size of these problems allowing for an
optimal solution to be found and used as a performance reference. Whiteson and
colleagues also advocate for the use of generalized domains in order to guard against
overfitting reinforcement learning methods to specific environments, with the goal of
improving our understanding of how methods generalize to other domains (White-
son et al. 2009 , 2011 ). Their work uses some of the classic reinforcement learning
benchmark problems such as the mountain car problem, acrobot, and puddle world
as base environments. Domain characteristics, such as action perturbations or state
observability, are then sampled from distributions, which produces domains with
more general characteristics.
2.2
Components of Reinforcement Learning
Reinforcement learning is based on the interaction between three different compo-
nents or entities (Fig. 2.1 ). These include: (1) the domain, or the environment in
which the agent acts; (2) the learning algorithm, or the procedure by which the agent
associates actions with outcomes; and (3) the representation of the learned knowl-
edge, or the agent itself. Figure 2.6 shows each of these three main components and
how these entities can be further broken down or specified to different methods. This
figure is not comprehensive, especially with regards to the representations and the
learning algorithms, and is only used to provide examples of these methods.
2.2.1
Domains
The domain in reinforcement learning is the environment in which the agent interacts.
The domain specifies the state space, the action space, and the reward function, and
all of these elements have an effect on the dynamics of the behavior of the agent. More
concretely, the domain includes applications such as those cited in Sect. 2.1 , including
benchmark problems, games, and real-world applications. When one considers these
Search WWH ::




Custom Search