Up the Down Staircase: Hierarchical Reinforcement Learning - Realtime Data Mining

Database Reference

In-Depth Information

Fig. 6.7 Illustration

of the update logic of

the preconditioner C 1

a 2

a 3

(

s

3 ,

a

3 )

a 0

( s 2 , a 0 )

s

3

a 1

(

s 0 ,

a 1 )

(

s 0 ,

a 0 )

s

2

s

1

a 0 and associated

actions: H

s

(

s

,

a

)

4

1

s

4

s 0 and associated states: G

This results from the reflexive coarse grid action y 1 of the group y 1 on itself.

Of course the reflexive update for x 1 is especially strong here.

An update via the action x 1 leads to an update of the actions x 1 themselves

and also x 2 :

x 1 ,

1

2

1

2 x 1 :

x 1 ¼

x 2 ¼

e

1 þ

e

It is the coarse grid action y 1 of the group y 1 on y 2 that is responsible for this.

■

Figure 6.7 illustrates the general logic of the updates using the example of the

action ( s 0 , a 0 ).

An update of the rule ( s 0 , a 0 ) therefore leads not only to the update of the rule

itself but also to the update of all rules in the same state group G of the initial

product s 0 into the same action group H of the recommended product a 0 .

From a technical point of view, there is another positive aspect: when the

preconditioner C 1 updates an action value for the state-action pair ( s , a ), even

though for ( s , a ) still no rule exists, it can be generated automatically. In this way

the hierarchical preconditioner automatically generates new recommendations for

products without recommendations (due to a lack or too little transaction history).

We will also stress the subject into the next section.

6.3 Learning on Category Level

So far we have considered the hierarchical RL only under the aspect of the

acceleration of convergence. However, as we mentioned at the end of the last

section, it can also be used for a further task: raising the recommendation coverage.

Realtime Data Mining

Search WWH ::

Custom Search

Home