Database Reference
In-Depth Information
a
a
a 3
a n −2
a n −1
1
2
s G
s 1
s 2
s 3
s n −1
s n
s A
Fig. 5.5 Sequence of products and multiple recommendations as states and actions extended by
generating node s G
the absorbing state s A . In some sense it is placed before the beginning of each
session, and the corresponding transition probability p s G s specifies the probability
that product s will be selected as first product of the session (Fig. 5.5 ).
From a technical point of view, the transitions from the generating and into the
absorbing state can be comfortably stored as rules s ! s 0 in the same way as for all
other product transitions. Thus, the rules s G ! s represent the transition from the
generating state into state s and the rules s ! s A the transition from state s into the
absorbing state.
The flow diagram of the extended simulation is depicted in Fig. 5.6b , whereas
Fig. 5.6a shows the basic simulation of Sect. 4.4 for comparison.
In the first phase of the extended simulation, we go through the historical
transaction data and estimate the transition probabilities and -rewards which form
the model of the environment. In contrast to the simulation of Sect. 4.4 ,herewedo
not estimate the unconditional probabilities p ss 0 but the complete p ss 0 which incorpo-
rate the influence of (multiple) recommendations.
This is very important because it allows us to run the actual simulation in the
second phase under quite realistic conditions. At the beginning of each session, we
calculate the initial (visited) product by virtue of the generating node. This product is
passed to the RE algorithm which learns and at the same time delivers recommen-
dations. Based on the current product and the recommendations, the simulation
environment calculates the next product and decides whether the product will be
added to the basket and ordered at the end of the session. This information (including
the basket event), in turn, is passed to the RE algorithm which learns again and
returns new recommendations, etc. As soon as the transition in the absorbing state
takes place, the session terminates. Before the termination, when indicated products
marked for purchase are ordered, this information is also transferred to the RE
algorithm as tracking event. Then the next session starts. After the specified number
of virtual sessions has been reached, the simulation terminates.
An important aspect of the analysis of the extended simulation is that, unlike as in
the first simulation of Sect. 4.4 , the prediction rates do no longer play the central role.
Instead, now the main characteristics are the cumulated values, i.e., the cumulated
reward over all sessions and the cumulated numbers of clicks, baskets, orders, and, last
not least, the cumulated revenue. In the next section we present the results of the
extended simulation for the P- and DP-Version introduced in Sects. 5.1 and 5.2 using
an artificial and a real-life data set, respectively. At this, we will state the values of the
cumulated rewards only, since it is obvious that they are correlated - depending on
their definition - to the shop characteristics like clicks and baskets.
 
Search WWH ::




Custom Search