Database Reference
In-Depth Information
Chapter 6
Up the Down Staircase: Hierarchical
Reinforcement Learning
Abstract We address the question of how hierarchical, or multigrid, methods may
figure in dynamic programming and reinforcement learning for recommendation
engines.
After providing a general introduction, we approach the framework of hierar-
chical methods from both the historical analytical and algebraic viewpoints; we
proceed to devising and justifying approaches to apply hierarchical methods to both
the model-based as well as the model-free case. In regard to the latter, we set out
from the multigrid reinforcement learning algorithms introduced by Ziv in [Ziv04]
and extend these methods to finite-horizon problems.
Back in Chap. 4 we established that reinforcement learning methods usually con-
verge only slowly. The introduction of hierarchical concepts is necessary in order to
solve this problem. Instead of trying to solve the Bellman equation ( 3.6 ) directly,
using the GPI method on enormous quantities of states and actions, we must split the
problem into a hierarchy of subtasks and then solve them in succession.
For many years, the development of hierarchical solution methods has been one
of the central fields of research for RL. Here we commonly distinguish between
temporal approaches, that is, the aggregation of actions over a sequence of steps,
and spatial approaches, that is, the aggregation over states. There exist a number of
spatial approaches [AR02, Diet00, MRLG05, PR98, SPS99]. However, they do not
seem to be suitable for our recommendation engine approach.
But state aggregations do. We remember that the Bellman equation can be
regarded as a discrete counterpart of a differential equation. Therefore, it is
evident that the multilevel methods developed in the course of numerical anal-
ysis - which are used especially for differential equations - can be applied to
provide a solution to the Bellman equation. This approach was first developed by
the Israeli information technologist Omer Ziv in his remarkable doctoral thesis
[Ziv04], in which at the same time he proved fundamental convergence state-
ments. We can develop these approaches for recommendation engines further, in
Search WWH ::




Custom Search