Civil Engineering Reference
In-Depth Information
left and right sides of the figure, respectively. Each thick line represents the extent
of a particular parameter for a particular subregion, which is numbered on the sides
of the thick lines. Variables are grouped together (rather than each subregion) so that
differences in the variable ranges can be easily seen. An example of this figure will
be presented later in this section when a 2-dimensional example function is explored
using sequential CART.
For each of the convergent subregions, we present some summary statistics, which
are computed as follows. We compute the proportion of convergent points from the
entire experimental design that fall into the respective subregions ( p conv ) as well as
the total number of points in each subregion. Statistics on the shape and size of the
subregions include the dimensionality ratio (Eq. 4.1 ), the average radius, and the
sum of the radii. For these metrics, the convergent parameter subregion boundaries
are all normalized over [0, 1] prior to computing all metrics. The dimensionality
ratio provides an indication of whether the parameter ranges of the subregion are
proportional or not. The average radius is the mean of all radii of the subregions, and
the sum of the radii is the total of all of the subregion radii across all parameters, and
these metrics are computed from Eq. 4.2 .
Regional sensitivity analysis (RSA) (Hornberger and Spear 1981 ; Saltelli et al.
2004 ; Ratto et al. 2007 ) (also called Monte Carlo filtering) is used as a model-free
method to investigate the univariate effects of parameters on the binary outcome
variable of whether or not a reinforcement learning run had converged. For each
parameter, RSA shows the cumulative distributions for what are called behavioral
and non-behavioral groups of points, where in our case, these groups correspond
to convergent and non-convergent reinforcement learning runs, respectively. Any
differences in the cumulative distributions between the two groups indicates that
different ranges of the respective parameter have an effect on learning conver-
gence. This difference can be quantified using a Kolmogorov-Smirnov statistic
KS
| B )
, which is the supremum of the set of distances be-
tween the behavioral (i.e., convergent) distribution F ( x
=
sup x |
F ( x
|
B )
F ( x
|
|
B ) and the non-behavioral
| B ). We use this statistic as a quantitative
measure to compare distributions between parameters.
(i.e., non-convergent) distribution F ( x
4.1.4
Empirical Convergence Criteria
The description of the sequential CART algorithm is based on the notion of
convergence for reinforcement learning, which is defined here. Many learning
algorithms (e.g., unsupervised or supervised, and some reinforcement learning
implementations) often have theoretical convergence guarantees. However, neural
network-based reinforcement learning does not have such a guarantee, and learning
performance must be quantified and assessed in another manner. Furthermore, rein-
forcement learning isn't typically analyzed using a large scale analysis as we do here,
and performance is often simply assessed by visually or quantitatively assessing a
handful of plots that show performance over the course of training. We use a novel
Search WWH ::




Custom Search