How Engines Learn to Generate Recommendations: Adaptive Learning Algorithms - Realtime Data Mining

Database Reference

In-Depth Information

a

a 3

a n −2

a n −1

1

2

s G

s 1

s 2

s 3

s n −1

s n

s A

Fig. 5.5 Sequence of products and multiple recommendations as states and actions extended by

generating node s G

the absorbing state s A . In some sense it is placed before the beginning of each

session, and the corresponding transition probability p s G s specifies the probability

that product s will be selected as first product of the session (Fig. 5.5 ).

From a technical point of view, the transitions from the generating and into the

absorbing state can be comfortably stored as rules s ! s 0 in the same way as for all

other product transitions. Thus, the rules s G ! s represent the transition from the

generating state into state s and the rules s ! s A the transition from state s into the

absorbing state.

The flow diagram of the extended simulation is depicted in Fig. 5.6b , whereas

Fig. 5.6a shows the basic simulation of Sect. 4.4 for comparison.

In the first phase of the extended simulation, we go through the historical

transaction data and estimate the transition probabilities and -rewards which form

the model of the environment. In contrast to the simulation of Sect. 4.4 ,herewedo

not estimate the unconditional probabilities p ss 0 but the complete p ss 0 which incorpo-

rate the influence of (multiple) recommendations.

This is very important because it allows us to run the actual simulation in the

second phase under quite realistic conditions. At the beginning of each session, we

calculate the initial (visited) product by virtue of the generating node. This product is

passed to the RE algorithm which learns and at the same time delivers recommen-

dations. Based on the current product and the recommendations, the simulation

environment calculates the next product and decides whether the product will be

added to the basket and ordered at the end of the session. This information (including

the basket event), in turn, is passed to the RE algorithm which learns again and

returns new recommendations, etc. As soon as the transition in the absorbing state

takes place, the session terminates. Before the termination, when indicated products

marked for purchase are ordered, this information is also transferred to the RE

algorithm as tracking event. Then the next session starts. After the specified number

of virtual sessions has been reached, the simulation terminates.

An important aspect of the analysis of the extended simulation is that, unlike as in

the first simulation of Sect. 4.4 , the prediction rates do no longer play the central role.

Instead, now the main characteristics are the cumulated values, i.e., the cumulated

reward over all sessions and the cumulated numbers of clicks, baskets, orders, and, last

not least, the cumulated revenue. In the next section we present the results of the

extended simulation for the P- and DP-Version introduced in Sects. 5.1 and 5.2 using

an artificial and a real-life data set, respectively. At this, we will state the values of the

cumulated rewards only, since it is obvious that they are correlated - depending on

their definition - to the shop characteristics like clicks and baskets.

Realtime Data Mining

Search WWH ::

Custom Search

Home