Database Reference
In-Depth Information
We conclude this section with the formulation of the action-value function for
multiple recommendations. From ( 5.16 ) it follows that ( 5.6 ) takes the form
ðÞ¼ X
s 0 ∈S a
ð X
s 0 6¼s a
q π s
p ss 0 r ss 0 þ cs
;
;
p ss 0 r ss 0 :
ð , this mea ns that we would need to consider all
possible combinations of recommendations a in order to find the highest action
value. This would result into an enormous complexity that cannot be handled for
real-life problems. For the moment, we use the simplest approach by selecting the
k best single recommendations, like in the linear case. This approach, however, only
for the simplified DP-Version ( 5.26 ) works exactly. (The simplified DP-Version
will be introduced in the next section.)
So although we are able to correctly determine all transition probabilities for
multiple recommendations and to use them accurately in the simulations (Sect. 5.4 ),
we still lack a computationally efficient approach to calculate the best multiple
recommendations. Of course, this should be a subject for future studies. Special
optimization techniques shall be able to solve this problem.
Comparison of Linear and Nonlinear Approaches
Summing up, the linear approach seems to be favorable to the nonlinear one
because it is easier to implement and does not leave any principal problems open.
Due to the nonlinearity of cs
;
5.3 Combination of Conditional and Unconditional
Approaches
In this section, we want to present a first approach how to deal with uncertain data.
So far, we have always assumed that all estimated probabilities p ðaÞ
ss 0 , i.e., p ss 0 and
p ss 0 , are equally reliable. However, in most applications for a state s , new transitions
s ! s 0 (usually represented as rules) are dynamically added during the process of
learning. For example, if s is a long-standing product of a web shop and recently a
new product s 00 was included into the assortment of the shop, then we may
dynamically add the transition probabilities p ðaÞ
ss 00
to the existing ones.
j p ðaÞ
ss 0
represent the estimated probabilities p ðaÞ
ss 0 after j update steps. Then the
described dynamic approach means that different target states s 0 may have different
counter values j . Obviously, for a state s 1 with a large counter j 1 in general, the
corresponding transition probabilities j 1 p ðÞ
Let
ss 1 can be considered as more reliable than
j 2 p ðÞ
ss 2 for a state s 2 with a small counter j 2 .
In order to calculate ( 5.8 ) meaningfully, the transition probabilities p ðaÞ
ss 0 must be
statistically stable, at least to some extent. In other words, adding transition
Search WWH ::




Custom Search