Decomposition in Transition: Adaptive Matrix Factorization - Realtime Data Mining

Database Reference

In-Depth Information

By defining the two factor matrices L ¼ US 1/2 and R ¼ VS 1/2 , we easily observe

that trace( W 1 ) + trace( W 2 ) ¼kLk

F + kRk

F and finally arrive at the optimization

problem

F þ kk

minimize

¼ P Ω A :

ð 8

35 Þ

subject to P Ω LR T

(Please consult [CR08] for a proof of equivalence of ( 8.34 )an( 8.35 ).)

Since in most practical applications data is noisy, we will allow some approx-

imation error on the observed entries,

replacing ( 8.35 ) by the less rigid

formulation

F þ kk

minimize

ð 8

36 Þ

P Ω ðÞ

F σ

P Ω LR T

subject to

for a small positive

. Thus, in Lagrangian form, we arrive at the formulation

P Ω ðÞ

F ,

F þ kk

P Ω LR T

minimize

ð 8

37 Þ

where

is the regularization parameter that controls the noise in the data.

Formulation ( 8.37 ) is also called maximum-margin matrix factorization (MMMF)

and goes back to Nathan Srebro in 2005 [RS05]. Related work was also done by the

group of Trevor Hastie [MHT10].

Problem ( 8.37 ) can be solved, e.g., by using a simple gradient descent method.

Interestingly, the first SVD solution submitted to Netflix (by Simon Funk

[Fun06], in 2006) used exactly this approach and achieved considerable progress.

The final price was a mixture of hundreds of models with the SVD playing a

crucial role.

After all, one may ask: what is the difference of the SVD ( 8.37 ) to the truncated

SVD ( 8.14 ) that we have considered before? This brings us back to the introductory

discussion in Sect. 8.1 about unknown values. The answer is that in ( 8.37 ), we

assume only the entries A ij ,( i , j ) ∈ Ω to be known. In contrast, in our previous

formulation, we consider all entries A ij ,( i , j )

to be zero. The Netflix competi-

tion, where all entries of A on a test set (of actual ratings of the users) had to be

predicted, is obviously a classic matrix completion problem and hence ( 8.37 ) is the

certainly the right approach.

In case of matrix factorization for an actual recommendation task, like that of a

user or session matrix in Example 8.1, or even the probability matrix P , the

discussion is more complex. In fact, we may view all non-visited entries of the

matrix to be zero since the user didn't show any interest in them. This justifies our

2 Ω

Realtime Data Mining

Search WWH ::

Custom Search

Home