Civil Engineering Reference
In-Depth Information
￿
Empirical convergence assessment procedure (RL): This work introduced a
novel empirical convergence assessment procedure that can be used to auto-
matically, with no human intervention or assistance required, determine if a
reinforcement learning run has converged. Whereas most reinforcement learning
implementations look at only a few runs, the work herein required that thousands
of reinforcement learning runs be performed, and thus an automated procedure
was essential.
￿
Convergent subregion parameter range plots (DoE): The sequential CART
procedure produces a unique set of results, that being parameter bounds in multiple
dimensions for potentially multiple convergent subregions. Understanding and
comparing these parameter bounds required a novel visualization, which led to
the development of the parameter range plots which allow for the visualization of
high dimensional ( > 2) parameter ranges.
8.4
Future Work
This work integrates two typically disparate fields, reinforcement learning (or more
generally, machine learning) and design of experiments. We also introduce a novel
methodological procedure, sequential CART for finding subregions within a pa-
rameter space that have specific characteristics or qualities. This work should be
considered as a start to our overall goal of better understanding reinforcement learn-
ing so that it can be successfully applied to real world and challenging problems.
As this work is comprised of so many different elements, there are many avenues to
explore in future work.
￿
Understanding why parameters have certain effects: With the goal of obtaining
a better understanding of the behavior of reinforcement learning, the next logical
direction to pursue after having found convergent parameter subregions, as done
in this work, would be to investigate exactly why parameters have the effects
that they do. As we have seen, parameters can have vastly different effects in
different regions of the parameter space for the same problem domain, and their
influence on reinforcement learning convergence can also vary across problem
domains. This strongly suggests that these differences are due to differences in the
domain characteristics. Exploring these domain characteristics using controlled
experiments could be done using either modified versions of specific domains of
interest or using generalized domains that can be tailored to have specific domain
characteristics (Kalyanakrishnan and Stone 2009 , 2011 ).
￿
Explore additional learning algorithms and representations: From a reinforce-
ment learning perspective, it would be interesting to apply the same methodology
we used herein to other learning algorithms and other representations. As men-
tioned, the TD( ʻ ) learning algorithm is the fundamental reinforcement learning
algorithm, but there are other algorithms that have been found to be more efficient
in some cases because they learn based on different and/or additional information.
Search WWH ::




Custom Search