Civil Engineering Reference
In-Depth Information
0
2000
4000
6000
8000
10000
Episode
Time limit
Jack−knife
Reached goal
0
2000
4000
6000
8000
10000
0
20
40
60
Episode
a
b
Performance and episode termination
proportions.
Example trajectory after learning.
Fig. 7.4 Example performance from a sample in subregion 81, with parameters: ʱ mag = 8 . 36 ×
10 3 , ʱ ratio = 2 . 204, ʻ = 0 . 695, ʳ = 0 . 978, = 0 . 851. The top plot on the left shows the moving
average of the number of time steps to the goal, and the bottom plot shows moving averages of the
proportions of how episodes terminated. The figure on the right shows the trajectory of the trailer
truck backing up after learning had converged. The gray shaded region in this figure indicates the
acceptable error region that the truck must reach.
is from a test run after learning had converged and shows a direct and smooth path
to the goal.
Regional sensitivity analysis (Fig. 7.5 ) suggests that the three reinforcement
learning parameters, ʻ , ʳ , and , have strong effects on convergence, whereas the
magnitude and ratio of the learning rates matter very little. The differences in the
interpretation of the RSA plots and the convergent parameter ranges produced by
CART modeling is likely due to the variability in the responses (convergent or non-
convergent) for the learning runs, as well as the fact that CART modeling only chooses
the best partition, based on the available data, into convergent and non-convergent
runs. Still, the general shapes of the RSA plots are consistent with the convergent
parameter ranges.
7.3
Discussion
This work is the first of its kind to use a pure reinforcement learning approach to
learning the tandem truck backer-upper problem. While we do not solve this problem
completely such that the trailer truck can be controlled to back up to a very specific
position and orientation from any initial state, we are able to learn how to control
the truck for a relaxed case of this problem. Even when learning to control the
truck for this relaxed case, we find the knowledge that is learned includes how to
avoid jack-knifing the truck, as is evident by the truck reaching the goal location,
 
Search WWH ::




Custom Search