Decomposition in Transition: Adaptive Matrix Factorization - Realtime Data Mining

Database Reference

In-Depth Information

of entries A ij ,( i , j )

is a subset of the complete set of entries m n .

Clearly, this problem is ill posed in order to guess the missing entries without

making any assumptions about the matrix A .

Now we suppose that the unknown matrix A has low rank. In [CR08], Emmanuel

Candes and Benjamin Recht showed that this assumption radically changes the

problem, making the search for solutions meaningful. We follow the guidelines of

[CR08, CT10].

For simplicity, assume that the rank- r matrix A is n n . Next, we define the

orthogonal projection P Ω : R

, where

∈ Ω

Ω

nn

nn onto the subspace of matrices vanishing

! R

outside of

as

Ω

X ij ,

ðÞ ∈Ω ,

i

;

P Ω ðÞ ij ¼

ð 8

:

29 Þ

0,

otherwise,

so that the information about A is given by P Ω ( A ). We want to recover the data

matrix by solving the optimization problem

minimize

rank ðÞ

subject to P Ω ðÞ¼P Ω A ,

ð 8

:

30 Þ

which is, in principle, possible if there is only one low-rank matrix with the given

entries. Unfortunately, ( 8.30 ) is difficult to solve as rank minimization is in general

an NP-hard problem for which no known algorithms are capable of solving prob-

lems in practical time for (roughly) n 10.

Candes and Recht proved in [CR08] that, first, the matrix completion problem is

not as ill posed as previously thought and, second, that exact matrix completion is

possible by convex programming. At this, they proposed to replace ( 8.30 )by

solving the nuclear norm problem

minimize

kk

subject to P Ω ðÞ¼P Ω A ,

ð 8

:

31 Þ

where the nuclear norm kXk * of a matrix X is defined as sum of its singular values:

:¼ X

i

kk

s i ðÞ:

ð 8

:

32 Þ

is sampled uniformly at random among all

subset of cardinality p and A obeys a low coherence condition than with large

probability, the unique solution to ( 8.31 ) is exactly A , provided that the number of

samples is

Candes and Recht proved that if

Ω

p Cn 6 = 5 r log n

:

ð 8

:

33 Þ

In [CT10] the estimate ( 8.33 ) is further improved toward the limit nr log n .

Realtime Data Mining

Search WWH ::

Custom Search

Home