Reinforcement Learning - Design of Experiments for Reinforcement Learning

Civil Engineering Reference

In-Depth Information

Gridworld has also served as a base domain for a generalized domain by

Kalyanakrishnan and Stone ( 2009 , 2011 ). These works used a Gridworld-like param-

eterized domain to evaluate how value function and policy search learning algorithms

perform under specific domain variations. These parameterized domains were devel-

oped to assess the effects of a small and specific set environment characteristics on

the performance of certain learning algorithms, and thus the parameterization of the

domains was not very extensive and only relied on 4 parameters: (1) domain size, (2)

stochasticity, (3) reward distribution, and (4) observability. In the works cited, these

domains restrict the state space to only 48 states, making it possible to benchmark

the performance of the learning algorithms on each domain against an optimal action

set determined using dynamic programming.

These Gridworld-based domains have a number of attractive properties, includ-

ing: the smoothness of the value function due to the connectivity of the states; the

2-dimensional representation of the state space allowing for 3-dimensional visual-

ization of the value function; and the small size of these problems allowing for an

optimal solution to be found and used as a performance reference. Whiteson and

colleagues also advocate for the use of generalized domains in order to guard against

overfitting reinforcement learning methods to specific environments, with the goal of

improving our understanding of how methods generalize to other domains (White-

son et al. 2009 , 2011 ). Their work uses some of the classic reinforcement learning

benchmark problems such as the mountain car problem, acrobot, and puddle world

as base environments. Domain characteristics, such as action perturbations or state

observability, are then sampled from distributions, which produces domains with

more general characteristics.

2.2

Components of Reinforcement Learning

Reinforcement learning is based on the interaction between three different compo-

nents or entities (Fig. 2.1 ). These include: (1) the domain, or the environment in

which the agent acts; (2) the learning algorithm, or the procedure by which the agent

associates actions with outcomes; and (3) the representation of the learned knowl-

edge, or the agent itself. Figure 2.6 shows each of these three main components and

how these entities can be further broken down or specified to different methods. This

figure is not comprehensive, especially with regards to the representations and the

learning algorithms, and is only used to provide examples of these methods.

2.2.1

Domains

The domain in reinforcement learning is the environment in which the agent interacts.

The domain specifies the state space, the action space, and the reward function, and

all of these elements have an effect on the dynamics of the behavior of the agent. More

concretely, the domain includes applications such as those cited in Sect. 2.1 , including

benchmark problems, games, and real-world applications. When one considers these

Search WWH ::

Custom Search

Home