Discussion - Design of Experiments for Reinforcement Learning

Civil Engineering Reference

In-Depth Information

Empirical convergence assessment procedure (RL): This work introduced a

novel empirical convergence assessment procedure that can be used to auto-

matically, with no human intervention or assistance required, determine if a

reinforcement learning run has converged. Whereas most reinforcement learning

implementations look at only a few runs, the work herein required that thousands

of reinforcement learning runs be performed, and thus an automated procedure

was essential.

Convergent subregion parameter range plots (DoE): The sequential CART

procedure produces a unique set of results, that being parameter bounds in multiple

dimensions for potentially multiple convergent subregions. Understanding and

comparing these parameter bounds required a novel visualization, which led to

the development of the parameter range plots which allow for the visualization of

high dimensional ( > 2) parameter ranges.

8.4

Future Work

This work integrates two typically disparate fields, reinforcement learning (or more

generally, machine learning) and design of experiments. We also introduce a novel

methodological procedure, sequential CART for finding subregions within a pa-

rameter space that have specific characteristics or qualities. This work should be

considered as a start to our overall goal of better understanding reinforcement learn-

ing so that it can be successfully applied to real world and challenging problems.

As this work is comprised of so many different elements, there are many avenues to

explore in future work.

Understanding why parameters have certain effects: With the goal of obtaining

a better understanding of the behavior of reinforcement learning, the next logical

direction to pursue after having found convergent parameter subregions, as done

in this work, would be to investigate exactly why parameters have the effects

that they do. As we have seen, parameters can have vastly different effects in

different regions of the parameter space for the same problem domain, and their

influence on reinforcement learning convergence can also vary across problem

domains. This strongly suggests that these differences are due to differences in the

domain characteristics. Exploring these domain characteristics using controlled

experiments could be done using either modified versions of specific domains of

interest or using generalized domains that can be tailored to have specific domain

characteristics (Kalyanakrishnan and Stone 2009 , 2011 ).

Explore additional learning algorithms and representations: From a reinforce-

ment learning perspective, it would be interesting to apply the same methodology

we used herein to other learning algorithms and other representations. As men-

tioned, the TD( ʻ ) learning algorithm is the fundamental reinforcement learning

algorithm, but there are other algorithms that have been found to be more efficient

in some cases because they learn based on different and/or additional information.

Search WWH ::

Custom Search

Home