Up the Down Staircase: Hierarchical Reinforcement Learning - Realtime Data Mining

Database Reference

In-Depth Information

Chapter 6

Up the Down Staircase: Hierarchical

Reinforcement Learning

Abstract We address the question of how hierarchical, or multigrid, methods may

figure in dynamic programming and reinforcement learning for recommendation

engines.

After providing a general introduction, we approach the framework of hierar-

chical methods from both the historical analytical and algebraic viewpoints; we

proceed to devising and justifying approaches to apply hierarchical methods to both

the model-based as well as the model-free case. In regard to the latter, we set out

from the multigrid reinforcement learning algorithms introduced by Ziv in [Ziv04]

and extend these methods to finite-horizon problems.

Back in Chap. 4 we established that reinforcement learning methods usually con-

verge only slowly. The introduction of hierarchical concepts is necessary in order to

solve this problem. Instead of trying to solve the Bellman equation ( 3.6 ) directly,

using the GPI method on enormous quantities of states and actions, we must split the

problem into a hierarchy of subtasks and then solve them in succession.

For many years, the development of hierarchical solution methods has been one

of the central fields of research for RL. Here we commonly distinguish between

temporal approaches, that is, the aggregation of actions over a sequence of steps,

and spatial approaches, that is, the aggregation over states. There exist a number of

spatial approaches [AR02, Diet00, MRLG05, PR98, SPS99]. However, they do not

seem to be suitable for our recommendation engine approach.

But state aggregations do. We remember that the Bellman equation can be

regarded as a discrete counterpart of a differential equation. Therefore, it is

evident that the multilevel methods developed in the course of numerical anal-

ysis - which are used especially for differential equations - can be applied to

provide a solution to the Bellman equation. This approach was first developed by

the Israeli information technologist Omer Ziv in his remarkable doctoral thesis

[Ziv04], in which at the same time he proved fundamental convergence state-

ments. We can develop these approaches for recommendation engines further, in

Search WWH ::

Custom Search

Home