Methodology - Design of Experiments for Reinforcement Learning

Civil Engineering Reference

In-Depth Information

that the convergent subregions surpass some quantitative or qualitative metrics. These

quality metrics essentially allow for pruning low subregions that are deemed to be

poor in some respect, which prohibits potentially endless exploration of smaller and

smaller subregions.

Just as subregions must surpass some threshold convergence proportion ʸ ˈ to be

considered convergent, subregions may also be pruned for having very few design

points or a very low convergence proportion ʲ . As each problem may have a very

different underlying structure and experimental designs may vary in size, the thresh-

old values for the minimum number of points for each leaf node and the minimum

convergence proportion are domain specific. The CART modeling approach has the

ability to produce leaf nodes that may be oddly shaped or sized. Measures of the

shape and size of the subregions could be used to further prune subregions from

further exploration. Some of these measures include the subregion dimensionality

aspect ratio ʱ ( i )

q , p and the radius r ( i )

q , p (Eqs. 4.1 - 4.2 ). In these equations, b j + d and b j

d th and j th elements of the subregion boundaries defined by B ( i )

are the j

q , p .

b j + d −

b j ( i )

q , p

max

ʱ ( i )

q , p =

(4.1)

b j + d −

b j ( i )

q , p

min

2 b j + d −

b j ( i )

q , p

r ( i )

q , p

(4.2)

j = 1

Unlike the design runs in X , which can remain unscaled, the measures mentioned

above should be used on scaled parameters such that the original parameter space

in each dimension defined by the bounds in B (0 0 ranges over [0, 1] in order for all

dimensions to be comparable. These measures would have similar threshold values

to ʸ ˈ . A subregion A ( i )

q , p may be pruned if ʱ ( i )

q , p >ʸ ʱ or r ( i )

q , p <ʸ r . The addition of

these threshold measures would be placed as if-statements in Algorithm 2 at line 13,

similarly to the assessment of ʲ ( i )

q , p on line 11.

4.1.3

Analysis of Sequential CART

The purpose of the sequential CART procedure is to determine convergent parameter

subregions in the reinforcement learning problems studied in this work. In each of

the reinforcement learning problem domains we study, we use the same procedure

to understand this sequential experimental procedure and to analyze the convergent

subregions.

We use a novel visualization to show the range and location of the convergent

subregions bounds in multiple dimensions. In these figures, groups of lines represent

a single variable, where the minimum and maximum values of the original parameter

space (over which the original experimental design was created) are shown on the

Design of Experiments for Reinforcement Learning

Search WWH ::

Custom Search

Home