Design of Experiments for the Mountain Car Problem - Design of Experiments for Reinforcement Learning

Civil Engineering Reference

In-Depth Information

B.2.3

Experimental Design and Analysis

The goal of this work is to understand the effects of ʻ , ʳ , and with respect to

learning convergence and performance; this work is not aimed at optimizing (i.e.,

tuning) parameter settings. This study was based on a single experimental design

with a two-stage analysis. The first analysis is aimed at assessing network conver-

gence over a large parameter space. A full factorial experiment (

D 1 ) is run with the

following continuous level settings for each parameter: ʻ over [0 . 1, 0 . 9] incremented

by 0.1, ʳ over [0 . 95, 0 . 99] incremented by 0.01, and

. This ex-

periment therefore consists of 135 factor-level combinations, with 10 replications at

each factor-level combination. The outcome for this experiment is a binary variable

indicating (empirical) convergence; recall that convergence requires that the network

converge during both training and testing. A logistic regression (LR) model is then

created to estimate the probability of convergence based on ʻ , ʳ , and in

={

0 . 7, 0 . 8, 0 . 9

}

D 1 .

The second analysis aims to determine the effects of ʻ , ʳ , and on performance

over a smaller parameter space in which the network frequently converges. The

smaller parameter space

D 2 is a subset of and is extracted from

D 1 (

D 2

ↂ D 1 ),

where

D 2 consists of the following level settings: ʻ

={

0 . 6, 0 . 7, 0 . 8

}

, ʳ

=

{

}

={

}

0 . 97, 0 . 98, 0 . 99

, and

0 . 7, 0 . 8, 0 . 9

. These factor levels were chosen after

D 1 . This design is a 3

×

assessing network convergence over

3 full factorial design,

with 10 replications at each of the 27 factor-level combinations. Analysis of variance

(ANOVA) with Type II sums of squares is used to determine if ʻ , ʳ , and (and

their interactions) has significant effects on the convergence speed (i.e., episode at

which training converged) and on the mean testing performance. Non-convergent

runs are qualified as undefined responses, as opposed to missing data, and these runs

are removed from the data for the analysis, resulting in unbalanced groups and the

need for Type II sums of squares.

B.3

Results

Experimental design

D 1 resulted in 77.85 % (1051/1350) of the runs converging dur-

ing training, and 48.59 % (656/1350) converging based on both training and testing

convergence criteria. The proportion of times that unique factor-level combinations

converged ranged from 0/10 to 10/10, confirming that some regions of the parameter

space that are clearly better than others. Figure B.1 shows the empirical probabilities

of convergence over

D 1 . A LR model was created to estimate network convergence

using linear, quadratic, and interaction terms (Table B.1 ) , and nearly all terms have

statistically significant coefficients. The LR model was used because it provides

a compact functional form for predicting convergence in this application, though

other function approximators, such as neural networks, could be used to model the

convergence probability.

Design of Experiments for Reinforcement Learning

Search WWH ::

Custom Search

Home