Methodology - Design of Experiments for Reinforcement Learning

Civil Engineering Reference

In-Depth Information

left and right sides of the figure, respectively. Each thick line represents the extent

of a particular parameter for a particular subregion, which is numbered on the sides

of the thick lines. Variables are grouped together (rather than each subregion) so that

differences in the variable ranges can be easily seen. An example of this figure will

be presented later in this section when a 2-dimensional example function is explored

using sequential CART.

For each of the convergent subregions, we present some summary statistics, which

are computed as follows. We compute the proportion of convergent points from the

entire experimental design that fall into the respective subregions ( p conv ) as well as

the total number of points in each subregion. Statistics on the shape and size of the

subregions include the dimensionality ratio (Eq. 4.1 ), the average radius, and the

sum of the radii. For these metrics, the convergent parameter subregion boundaries

are all normalized over [0, 1] prior to computing all metrics. The dimensionality

ratio provides an indication of whether the parameter ranges of the subregion are

proportional or not. The average radius is the mean of all radii of the subregions, and

the sum of the radii is the total of all of the subregion radii across all parameters, and

these metrics are computed from Eq. 4.2 .

Regional sensitivity analysis (RSA) (Hornberger and Spear 1981 ; Saltelli et al.

2004 ; Ratto et al. 2007 ) (also called Monte Carlo filtering) is used as a model-free

method to investigate the univariate effects of parameters on the binary outcome

variable of whether or not a reinforcement learning run had converged. For each

parameter, RSA shows the cumulative distributions for what are called behavioral

and non-behavioral groups of points, where in our case, these groups correspond

to convergent and non-convergent reinforcement learning runs, respectively. Any

differences in the cumulative distributions between the two groups indicates that

different ranges of the respective parameter have an effect on learning conver-

gence. This difference can be quantified using a Kolmogorov-Smirnov statistic

KS

| B )

, which is the supremum of the set of distances be-

tween the behavioral (i.e., convergent) distribution F ( x

=

sup x |

F ( x

|

B )

−

F ( x

|

B ) and the non-behavioral

| B ). We use this statistic as a quantitative

measure to compare distributions between parameters.

(i.e., non-convergent) distribution F ( x

4.1.4

Empirical Convergence Criteria

The description of the sequential CART algorithm is based on the notion of

convergence for reinforcement learning, which is defined here. Many learning

algorithms (e.g., unsupervised or supervised, and some reinforcement learning

implementations) often have theoretical convergence guarantees. However, neural

network-based reinforcement learning does not have such a guarantee, and learning

performance must be quantified and assessed in another manner. Furthermore, rein-

forcement learning isn't typically analyzed using a large scale analysis as we do here,

and performance is often simply assessed by visually or quantitatively assessing a

handful of plots that show performance over the course of training. We use a novel

Design of Experiments for Reinforcement Learning

Search WWH ::

Custom Search

Home