Database Reference
In-Depth Information
4.1 Basic Approach
As described initially, each product view represents a state s and a recommendation
of another product represents an action a . Each web session (session for short)
forms an episode.
The result is that the interaction between user and recommendation engine in
each web session can be considered as a sequence of product transitions under the
influence of recommendations (Fig. 4.2 ):
This permits us to model the most important statistical characteristics such as
action values, transition probabilities, and rewards using rules s ! s 0 , which can be
saved, for instance, in files or database tables (we will explain the details of this
later). Of course, not every action will necessarily lead to an accepted recommen-
dation: the user can also ignore the recommendations and go to an entirely unrelated
product. In this case, however, the product transition is added as a new rule to the
rule base and thus provides a new potential action.
Since all actions a represent product views, the sets of states S and actions A are
isomorphic:
S A :
ð 4
:
1 Þ
It should be noted that for reasons of complexity, not all actions A are considered
for each state s , but only a subset A(s) , which initially contains all product
transitions that have actually occurred, together with actions derived by other
means such as hierarchies (Chap. 6 ) . By this means, the action set A(s) expands
dynamically in the course of the learning process.
In accordance with ( 4.1 ), we introduce the notation s a for the product associated
with the recommendation a (i.e., the recommended product “a”). Conversely,
a s represents the recommendation associated with the product s . The successor
states corresponding to the recommendations of the action set A(s) are denoted by
S A ( s ) , and thus the isomorphism ( 4.1 ) also applies on the recommendations of any
state s : S A ( s ) A ( s ). The number of recommendations in s , i.e., the cardinality of
the action set, is usually denoted by m.
At each step, the RE receives a reward r . The sum of all the rewards should be
maximized over the complete session. The reward is defined for each step as
follows: if a product s is placed in the shopping basket or is bought, the preceding
action (i.e., the “recommendation” a , which has led to the product) receives the
a n −2
a n −1
a 1
a 2
a 3
s 3
s 1
s 2
s n −1
s n
s A
Fig. 4.2 Sequence of products and recommendations as states and actions including absorbing
state s A
Search WWH ::




Custom Search