Recommendations as a Game: Reinforcement Learning for Recommendation Engines - Realtime Data Mining

Database Reference

In-Depth Information

4.1 Basic Approach

As described initially, each product view represents a state s and a recommendation

of another product represents an action a . Each web session (session for short)

forms an episode.

The result is that the interaction between user and recommendation engine in

each web session can be considered as a sequence of product transitions under the

influence of recommendations (Fig. 4.2 ):

This permits us to model the most important statistical characteristics such as

action values, transition probabilities, and rewards using rules s ! s 0 , which can be

saved, for instance, in files or database tables (we will explain the details of this

later). Of course, not every action will necessarily lead to an accepted recommen-

dation: the user can also ignore the recommendations and go to an entirely unrelated

product. In this case, however, the product transition is added as a new rule to the

rule base and thus provides a new potential action.

Since all actions a represent product views, the sets of states S and actions A are

isomorphic:

S A :

ð 4

:

1 Þ

It should be noted that for reasons of complexity, not all actions A are considered

for each state s , but only a subset A(s) , which initially contains all product

transitions that have actually occurred, together with actions derived by other

means such as hierarchies (Chap. 6 ) . By this means, the action set A(s) expands

dynamically in the course of the learning process.

In accordance with ( 4.1 ), we introduce the notation s a for the product associated

with the recommendation a (i.e., the recommended product “a”). Conversely,

a s represents the recommendation associated with the product s . The successor

states corresponding to the recommendations of the action set A(s) are denoted by

S A ( s ) , and thus the isomorphism ( 4.1 ) also applies on the recommendations of any

state s : S A ( s ) A ( s ). The number of recommendations in s , i.e., the cardinality of

the action set, is usually denoted by m.

At each step, the RE receives a reward r . The sum of all the rewards should be

maximized over the complete session. The reward is defined for each step as

follows: if a product s is placed in the shopping basket or is bought, the preceding

action (i.e., the “recommendation” a , which has led to the product) receives the

a n −2

a n −1

a 1

a 2

a 3

s 3

s 1

s 2

s n −1

s n

s A

Fig. 4.2 Sequence of products and recommendations as states and actions including absorbing

state s A

Realtime Data Mining

Search WWH ::

Custom Search

Home