The Tandem Truck Backer-Upper Problem - Design of Experiments for Reinforcement Learning - page 134

Civil Engineering Reference

In-Depth Information

ʱ mag

ʱ ratio

ʻ

Convergent

Non−convergent

Convergent

Non−convergent

Convergent

Non−convergent

KS = 0.0734

KS = 0.048

KS = 0.252

0.00100

0.00550

0.01000

2.00

2.75

3.50

4.25

5.00

0.400

0.550

0.700

Parameter value

Parameter value

Parameter value

ʳ

ʵ

Convergent

Non−convergent

Convergent

Non−convergent

KS = 0.211

KS = 0.183

0.960

0.975

0.990

0.85

0.88

0.91

0.94

0.97

Parameter value

Parameter value

Fig. 7.5 Regional sensitivity analysis based on convergence of all experimental runs for the TTBU

problem.

which is essential to solving this problem. Furthermore, we find learning algorithm

parameter ranges that have good performance in being able to control the truck to

the goal location, which is a notable achievement. While we'd like to have a robust

controller that can be used more generally, technically, all that is needed for an

implementation of this is one convergent learning run such as that shown in Fig. 7.4 .

Due to the challenging nature of this problem, and partially due to the basic re-

inforcement learning strategy used, we believe that a sequential learning approach

is essential to have a refined controller. Thus, the work we present consists of per-

haps the first stage of this sequential learning process. Subsequent training could be

used to improve the current controller in a number of ways. The tolerances on the

goal location (and orientation) could be reduced in order to back up the truck to a

more specific location. The ability of the controller could be improved so that it can

generalize to successfully back up the truck from anywhere in the domain. These

approaches could be used in stages or used in an adaptable learning procedure that

refines the goal tolerances or increases the state space coverage based on the current

learning performance. In any case, regardless of the sequential training approach,

in each of these stages the network weights from the previous training stage would

be used as the starting weights for learning in subsequent stages. We note that it

is always possible that a new network with random weights could also be trained

Next Page

Design of Experiments for Reinforcement Learning

Search WWH ::

Custom Search

Home