Database Reference
In-Depth Information
By defining the two factor matrices L ¼ US 1/2 and R ¼ VS 1/2 , we easily observe
that trace( W 1 ) + trace( W 2 ) ¼kLk
F + kRk
F and finally arrive at the optimization
problem
1
2
2
F þ kk
2
F
minimize
kk
¼ P Ω A :
ð 8
:
35 Þ
subject to P Ω LR T
(Please consult [CR08] for a proof of equivalence of ( 8.34 )an( 8.35 ).)
Since in most practical applications data is noisy, we will allow some approx-
imation error on the observed entries,
replacing ( 8.35 ) by the less rigid
formulation
1
2
2
F þ kk
2
F
minimize
kk
ð 8
:
36 Þ
P Ω ðÞ
2
F σ
P Ω LR T
subject to
for a small positive
σ
. Thus, in Lagrangian form, we arrive at the formulation
P Ω ðÞ
1
2
2
F ,
2
F þ kk
2
F
P Ω LR T
minimize
λ
kk
þ
ð 8
:
37 Þ
where
is the regularization parameter that controls the noise in the data.
Formulation ( 8.37 ) is also called maximum-margin matrix factorization (MMMF)
and goes back to Nathan Srebro in 2005 [RS05]. Related work was also done by the
group of Trevor Hastie [MHT10].
Problem ( 8.37 ) can be solved, e.g., by using a simple gradient descent method.
Interestingly, the first SVD solution submitted to Netflix (by Simon Funk
[Fun06], in 2006) used exactly this approach and achieved considerable progress.
The final price was a mixture of hundreds of models with the SVD playing a
crucial role.
After all, one may ask: what is the difference of the SVD ( 8.37 ) to the truncated
SVD ( 8.14 ) that we have considered before? This brings us back to the introductory
discussion in Sect. 8.1 about unknown values. The answer is that in ( 8.37 ), we
assume only the entries A ij ,( i , j ) ∈ Ω to be known. In contrast, in our previous
formulation, we consider all entries A ij ,( i , j )
λ
to be zero. The Netflix competi-
tion, where all entries of A on a test set (of actual ratings of the users) had to be
predicted, is obviously a classic matrix completion problem and hence ( 8.37 ) is the
certainly the right approach.
In case of matrix factorization for an actual recommendation task, like that of a
user or session matrix in Example 8.1, or even the probability matrix P , the
discussion is more complex. In fact, we may view all non-visited entries of the
matrix to be zero since the user didn't show any interest in them. This justifies our
2 Ω
Search WWH ::




Custom Search