Database Reference
In-Depth Information
value of the product s (price, revenue, etc.) as its reward; otherwise, it receives a
small click reward, close to 0. This reflects the primary goal of seeking to maximize
the shopping basket values or the sales/revenue. Note that orders constitute a
delayed reward, since, in most cases, they appear only at the end of a session.
The definition of the correct reward is linked to various refinements that will not be
further explored here.
We now come to the statistical characteristics. Let us state our first fundamental
assumption:
Assumption 4.1 (Markov property for REs): In every state s, the optimal
action a, i.e., the best recommendation, depends solely on the current state s ,
i.e., the product under consideration.
Of course, this Markov property for REs is satisfied only incompletely, since the
best recommendation also depends on the preceding states of s together with their
transactions. Nevertheless, for the evaluation of a recommendation by the user, the
product currently viewed plays the main role, so the assumption may be considered
reasonable. (There is also compelling empirical evidence on this point, namely,
classic cross-selling, which is described using precisely this form of rules and
whose effectiveness is beyond doubt.)
As a further simplification, let us assume that the reward in the state transition
from s to s 0 is independent of the influence of the action a :
Assumption 4.2 (Reward property for REs): For each state transition from
s to s 0 , the obtained reward r ss 0
is independent of the action a .
This means that
r ss 0 ¼ r ss 0 :
ð 4
:
2 Þ
In fact, it can be assumed that the user's decision as to whether or not to place a
product in the shopping basket depends primarily on the product itself and not on
the preceding recommendation. Thus, the estimated reward can technically be
validly saved as a characteristic of the rule s ! s 0 .
Theaction-valuefunction q(s,a) assigns the expected return, i.e., the expected
sales over the remainder of the session, to each product s and to each of its
recommendations a . Technically, q(s,a) can thus also be represented by the rule
s ! s a from product s to the recommended product s a .
There remains the question of the transition probabilities p ss 0 . This is a compli-
cated subject, which we shall consider in depth in Chap. 5 .
Search WWH ::




Custom Search