Dimensionality Reduction - Mining of Massive Datasets

Database Reference

In-Depth Information

(f) How much of the energy of the original singular values is retained by the one-di-

mensional approximation?

EXERCISE 11.3.2 Use the SVD from Fig. 11.7 . Suppose Leslie assigns rating 3 to Alien and

rating 4 to Titanic , giving us a representation of Leslie in “movie space” of [0, 3, 0, 0, 4].

Find the representation of Leslie in concept space. What does that representation predict

about how well Leslie would like the other movies appearing in our example data?

! EXERCISE 11.3.3 Demonstrate that the rank of the matrix in Fig. 11.8 is 3.

! EXERCISE 11.3.4 Section 11.3.5 showed how to guess the movies a person would most

like. How would you use a similar technique to guess the people that would most like a

given movie, if all you had were the ratings of that movie by a few people?

11.4 CUR Decomposition

There is a problem with SVD that does not show up in the running example of Section

11.3 . In large-data applications, it is normal for the matrix M being decomposed to be very

sparse; that is, most entries are 0. For example, a matrix representing many documents (as

rows) and the words they contain (as columns) will be sparse, because most words are not

present in most documents. Similarly, a matrix of customers and products will be sparse

because most people do not buy most products.

We cannot deal with dense matrices that have millions or billions of rows and/or

columns. However, with SVD, even if M is sparse, U and V will be dense. 4 Since Σ is di-

agonal it will be sparse, but Σ is usually much smaller than U and V , so its sparseness does

not help.

In this section, we shall consider another approach to decomposition, called CUR-de-

composition. The merit of this approach lies in the fact that if M is sparse, then the two

large matrices (called C and R for “columns” and “rows”) analogous to U and V in SVD

are also sparse. Only the matrix in the middle (analogous to Σ in SVD) is dense, but this

matrix is small so the density does not hurt too much.

Unlike SVD, which gives an exact decomposition as long as the parameter r is taken to

be at least as great as the rank of the matrix M , CUR-decomposition is an approximation

no matter how large we make r . There is a theory that guarantees convergence to M as r

gets larger, but typically you have to make r so large to get, say within 1% that the method

becomes impractical. Nevertheless, a decomposition with a relatively small value of r has

a good probability of being a useful and accurate decomposition.

Search WWH ::

Custom Search

Home