Civil Engineering Reference
In-Depth Information
Chapter 4
Methodology
Implementing reinforcement learning requires several parameters related to the
learning algorithm as well as the representation (i.e., a neural network for the imple-
mentation in this work). Additionally, the domain or problem that is to be learned can
be characterized across a variety of dimensions, including the state-space, the action-
space, the reward function, among many others. This work is built on the fundamental
belief that there are subregions in the parameter space of the learning algorithm and
the representation that allow for reinforcement learning to be successful for different
domain characteristics.
The goal of this work is to investigate under what parameter conditions reinforce-
ment learning works, and furthermore, how these parameters affect the performance.
We therefore break this problem into two parts. The first part attempts to find param-
eter subregions, within a large parameter space, for which reinforcement learning is
generally successful; we call these regions convergent subregions of the parameter
space such that reinforcement learning runs frequently converge. The second part
takes a closer look at these convergent subregions and attempts to understand how
these parameters affect learning performance and what parameters are the most influ-
ential. The problem domains analyzed later in this work use very similar experimental
methodologies and analysis procedures, and instead of repeating the methodology
used for each problem domain, we present the methods used in this chapter.
4.1
Sequential CART
In this section we describe our novel approach for finding convergent parameter
subregions for reinforcement learning. Essentially, this procedure attempts to find
these convergent subregions by slowly narrowing down the parameter space using
a sequential experimentation that is based on CART (classification and regression
trees) models. This section first briefly reviews CART models and then describes the
novel sequential CART experimentation approach. We conclude the description of
this procedure and hope to make concepts more concrete by demonstrating how the
sequential CART procedure works on an example 2-dimensional function.
Search WWH ::




Custom Search