Civil Engineering Reference
In-Depth Information
Table 7.3 Parameter settings for sequential CART modeling and CART models.
Description
Value
Number of variables
5
Initial number of design points
125
Replications per design point
3
Number of sequential CART iterations
3
Number of additional design points per low -leaf
25
Sample design method
Latin hypercube
Minimum design points per split a
0 . 02
y
Minimum design points per leaf a
0 . 01 y
Minimum complexity change per split
0.001
Maximum number of surrogates
5
Proportion of low points per leaf
0.8
Convergence proportion threshold
0.9
Minimum convergence proportion per convergent domain
0.05
Minimum design points per convergent domain
5
a
y
is the length of the response vector for the current iteration
7.2.1
Convergent Subregions
The tandem truck backer-upper problem is significantly more challenging than the
other problems included in this work, and this is because it has a higher dimensional
state space and control of the truck is rather fragile. These domain features make
learning a consistently successful control strategy through reinforcement learning
considerably more difficult. With regards to the current work, this likely translates
into convergent parameter subregions that are quite small or that convergence for the
same parameters is unstable and variable. This was observed during the experimental
runs for the sequential CART modeling.
In the mountain car and single trailer truck backer-upper problems, the perfor-
mance measure used in sequential CART modeling was based on the number of time
steps required to reach the goal after the empirical convergence criteria had been
achieved, and the convergence indicator was based on whether or not the empirical
convergence criteria had been satisfied. In the tandem truck backer-upper problem,
however, it is very infrequent that the empirical convergence criteria is satisfied using
the same convergence parameters as in the other problems. In light of the difficulty
of this problem, and considering that, in this initial learning phase, we are only in-
terested in learning a very general control strategy, we relax the convergence criteria
and change the performance metric used in sequential CART modeling.
Thus, rather than using the number of time steps to the goal at convergence,
we define a performance metric perf
max p goal where max p goal is the
maximum moving average of the proportion of times the goal is reached over all
=
1
 
Search WWH ::




Custom Search