Dimensionality Reduction - Mining of Massive Datasets

Database Reference

In-Depth Information

✦ Dimensionality Reduction by PCA : By representing the matrix of points by a small number of its eigenvectors, we

can approximate the data in a way that minimizes the root-mean-square error for the given number of columns in the

representing matrix.

✦ Singular-Value Decomposition : The singular-value decomposition of a matrix consists of three matrices, U , Σ, and

V . The matrices U and V are column-orthonormal, meaning that as vectors, the columns are orthogonal, and their

lengths are 1. The matrix Σ is a diagonal matrix, and the values along its diagonal are called singular values. The

product of U , Σ, and the transpose of V equals the original matrix.

✦ Concepts : SVD is useful when there are a small number of concepts that connect the rows and columns of the ori-

ginal matrix. For example, if the original matrix represents the ratings given by movie viewers (rows) to movies

(columns), the concepts might be the genres of the movies. The matrix U connects rows to concepts, Σ represents

the strengths of the concepts, and V connects the concepts to columns.

✦ Queries Using the Singular-Value Decomposition : We can use the decomposition to relate new or hypothetical rows

of the original matrix to the concepts represented by the decomposition. Multiply a row by the matrix V of the de-

composition to get a vector indicating the extent to which that row matches each of the concepts.

✦ Using SVD for Dimensionality Reduction : In a complete SVD for a matrix, U and V are typically as large as the

original. To use fewer columns for U and V , delete the columns corresponding to the smallest singular values from

U , V , and Σ. This choice minimizes the error in reconstructing the original matrix from the modified U , Σ, and V .

✦ Decomposing Sparse Matrices : Even in the common case where the given matrix is sparse, the matrices constructed

by SVD are dense. The CUR decomposition seeks to decompose a sparse matrix into sparse, smaller matrices whose

product approximates the original matrix.

✦ CUR Decomposition : This method chooses from a given sparse matrix a set of columns C and a set of rows R , which

play the role of U and V T in SVD; the user can pick any number of rows and columns. The choice of rows and

columns is made randomly with a distribution that depends on the Frobenius norm, or the square root of the sum of

the squares of the elements. Between C and R is a square matrix called U that is constructed by a pseudoinverse of

the intersection of the chosen rows and columns.

11.6 References for Chapter 11

A well regarded text on matrix algebra is [ 4 ] .

Principal-component analysis was first discussed over a century ago, in [ 6 ].

SVD is from [ 3 ]. There have been many applications of this idea. Two worth mentioning

are [ 1 ], dealing with document analysis, and [ 8 ], dealing with applications in biology.

The CUR decomposition is from [ 2 ] and [ 5 ]. Our description follows a later work [ 7 ] .

[1] S. Deerwester, S.T. Dumais, G.W. Furnas, T.K. Landauer, and R. Harshman, “Indexing by latent semantic analys-

is,” J. American Society for Information Science 41:6 (1990).

[2] P. Drineas, R. Kannan, and M.W. Mahoney, “Fast Monte Carlo algorithms for matrices III: Computing a com-

pressed approximate matrix decomposition,” SIAM J. Computing 3 6 :1 (2006), pp. 184-206.

[3] G.H. Golub and W. Kahan, “Calculating the singular values and pseudo-inverse of a matrix,” J. SIAM Series B

2:2 (1965), pp. 205-224.

[4] G.H. Golub and C.F. Van Loan, Matrix Computations , JHU Press, 1996.

[5] M.W. Mahoney, M. Maggioni, and P. Drineas, Tensor-CUR decompositions for tensor-based data, SIGKDD , pp.

327-336, 2006.

[6] K. Pearson, “On lines and planes of closest fit to systems of points in space,” Philosophical Magazine 2:11 (1901),

pp. 559-572.

Search WWH ::

Custom Search

Home