Database Reference
In-Depth Information
Rather than only considering states of single products p i where s i ¼ { p i }, as
before, the simplest approach is to switch to short product sequences of
l previous products (or, more precisely, product views) in the episode and to
include these as equally valid states s j ¼ { p j , p j 1 ,
, p jlþ 1 } in the state space
S. Because of its complexity, this approach, which can also be found in the
literature (for RL in [SHB05]), is of course very limited and can only reasonably
be used for small values of l (usually 2 or 3). At most it represents a small
expansion of the existing approach, but it does not solve the crux of the problem.
Probably the most promising route is to factorize the action-value and state-
value functions and in the model-based case additionally the transition probabilities
or rewards. This tensor factorization not only allows a theoretically unlimited
number of new dimensions to be included but also makes it possible to regularize
transition probabilities in particular. We proceed as follows: in this chapter we will
introduce the tensor factorization, especially in its adaptive form, and combine it
with reinforcement learning in Chap. 10 .
...
8.1 Matrix Factorizations in Data Mining and Beyond
In classical data mining, approaches based on matrix factorization are ubiquitous.
Typically, they arise as mathematical core problems in collaborative filtering
(CF). A classical application of CF to recommendation engineering is the
prediction of product ratings by users. Unlike in classical CF, we shall use
sessions instead of users (from a mathematical point of view, this does not
make any difference) for consistency reasons in the following. Like RL, CF is
behavioristic in the sense that no background information with respect to neither
of users nor products is involved.
Instead, we associate with each session a list assigning to each product the
rating given by the user. These ratings may be explicit, e.g., users may be
prompted to rate each visited product on a scale from 1 to 5, or, more com-
monly, implicit (as before in this topic). As for the latter, one may, for instance,
endow each type of customer transaction with a score value, say, 1 for a click,
5 for an “ add to cart ”or“ add to wish list, ” and 10 for actually buying the
product. We consider this list as a signal or, simply, a vector. Inspired by noise
reduction and deconvolution techniques in signal processing, most CF
approaches are based on the assumption that the thus arising data are noise-
afflicted observations of intrinsically low-dimensional signals generated by some
unknown source. How shall we proceed to formalize the situation statistically?
The decisive obstacle is the mathematical treatment of the unknown values.
Basically, this may be surmounted in two different manners: the ostensibly more
sophisticated approach consists in modeling the unknown ratings as hidden
variables, which need to be estimated along with the underlying source. Putting
it in terms of signal processing, this gives rise to a problem related to the
reconstruction of a partially observed signal. Dealing with hidden variables in
Search WWH ::




Custom Search