Decomposition in Transition: Adaptive Matrix Factorization - Realtime Data Mining

Database Reference

In-Depth Information

It turns out that the adaptive algorithms for the matrix case carry over rather

smoothly to the higher-order setting.

• Most importantly, our research in Chap. 10 will focus on a combination of

adaptive CF and RL. Specifically, we are heading to apply (tensor) CF to the

transition probability matrices (or tensors) of Markov decision processes as a

means of approximation and regularization to render model-based methods

tractable for large state spaces.

In what follows, the above points will be discussed in more detail.

8.2 Collaborative Filtering

Let us, for the sake of completeness, first address classical collaborative filtering.

The first ever CF system was the Information Tapestry Project by Xerox PARC in

the early 1990s [GNBT92]. In the further development,

the research group

GroupLens played an important part.

In classical CF, an identical “decomposition” of the matrix is carried out in ( 8.1 ),

that is, X is taken to be A itself and Y to be the identity matrix. Thus, the “low-

dimensional” subspace is given by the data space. Intuitively, there is no decom-

position whatsoever - all sessions are stored as they are.

Having clarified the factorization, we shall briefly address how it is used for

computation. Let N ( p ) be the set of all sessions t with the considered product p (i.e.,

p has been clicked at least once). Then the predicted reward value

^

a ps of a product

p not observed in the session is computed as follows:

X

s st a pt b pt

t

NðÞ

∈

X

a ps ¼ b ps þ

^

,

s st

t ∈ NðÞ

where b ps is the baseline prediction for a ps (e.g., the mean value of the rewards of all

products in the session) and s st a measure of similarity between the sessions s and t .

Frequently employed measures of similarity are the cosine measure and the

Pearson correlation ; see below.

What does this formula mean? For the prediction of product

a ps , we choose all

previously observed sessions that contain product p and compute the weighted

mean of their rewards for the product p , where the weights are given by the

similarities to the session s . That is, if a session u is closer to session s than another

session t , then the reward a ps makes a greater contribution than the reward a pt .

In practice, the described CF approach yields fairly good results, since, in

particular, it takes the entire history of sessions into account. An essential draw-

back, however, is the poor scaling with respect to computing time, as well as

memory space. Indeed, we have to keep the entire matrix A in memory, and the

^

Realtime Data Mining

Search WWH ::

Custom Search

Home