Discussion - Design of Experiments for Reinforcement Learning

Civil Engineering Reference

In-Depth Information

Our problem could be considered similar to that of structural reliability, where the

goal is to find the probability of failure of a system or structure which also attempts

to find a boundary (which may be a contour line and not axis-aligned boundaries)

between two regions. In such a problem, there is typically some measurable quan-

titative performance value that characterizes the quality of that under study; even if

this value is poor or somewhat erroneous, there is always a value. In other words,

the code is non-degenerate. However, our problem is different and unique in that

in the cases when reinforcement learning does not converge, there is no measurable

value of performance. Furthermore, the regions we seek are defined by convergence

versus non-convergence, and it is likely that there is not a single value that separates

these regions. Thus, methods that are often used for structural reliability cannot be

directly applied, though similar concepts could be used to develop a procedure that

could help identify convergent subregions.

Another potential use of the sequential CART procedure would be to use it as a

method for screening variables, potentially reducing the number of variables to be

explored in subsequent experimentation. In some of the problems considered here,

some of the convergent parameter subregions extended over nearly the entire original

parameter space for individual parameters and for most of the convergent subregions

found (c.f., ʱ mag for the mountain car problem and for the TTBU problem). In

these cases, a specific parameter range (as a subset of the range explored) may not

be required for convergence. Setting these parameters at their average values would

make any subsequent experimentation easier due to the smaller number of variables.

Ideally, the subregions labeled as convergent from the sequential CART procedure

would have purely convergent runs. Though, due to the random sampling of the

experimental design points and a computational budget, achieving pure subregions

was not possible. Increasing the purity of these subregions would require more design

points and more replicates for each design point. An obvious extension to the current

sequential CART algorithm is to parallelize the experimentation within each iteration

of the algorithm, and this would allow for running more design runs and improving

the accuracy of our results.

8.2.2

Stochastic Kriging

Stochastic kriging is a rather recent extension of deterministic kriging. There are a

number of studies that explore the effects of experimentation and modeling on either

stochastic or deterministic kriging, including the experimental design (Chen and Lin

2013 ), the use of common random numbers in the experimental design (Chen et al.

2012 ), bootstrap model parameter estimation (Kliejnen 2013 ), and using gradient

estimators to improve the metamodeling (Chen et al. 2013 ). However, these studies

focus on low-dimensional (i.e., 1-D or 2-D) benchmark problems, and it is unknown

how these methods extend to high dimensional problems.

Search WWH ::

Custom Search

Home