Dimensionality Reduction - Mining of Massive Datasets

Database Reference

In-Depth Information

Why the Pseudoinverse Works

In general suppose a matrix M is equal to a product of matrices XZY . If all the inverses exist, then the rule for inverse

of a product tell us M −1 = Y −1 Z −1 X −1 . Since in the case we are interested in, XZY is an SVD, we know X is column-or-

thonormal and Y is row-orthonormal. In either of these cases, the inverse and the transpose are the same. That is, XX T

is an identity matrix of the appropriate size, and so is Y Y T . Thus, M −1 = Y T Z −1 X T .

We also know Z is a diagonal matrix. If there are no 0's along the diagonal, then Z −1 is formed from Z by taking the

numerical inverse of each diagonal element. It is only when there are 0's along the diagonal of Z that we are unable

to find an element for the same position in the inverse such that we can get an identity matrix when we multiply Z by

its inverse. That is why we resort to a “pseudoinverse,” accepting the fact that the product ZZ + will not be an identity

matrix, but rather a diagonal matrix where the i th diagonal entry is 1 if the i th element of Z is nonzero and 0 if the i th

element of Z is 0.

11.4.1

Definition of CUR

Let M be a matrix of m rows and n columns. Pick a target number of “concepts” r to be used

in the decomposition. A CUR-decomposition of M is a randomly chosen set of r columns

of M , which form the m × r matrix C , and a randomly chosen set of r rows of M , which

form the r × n matrix R . There is also an r × r matrix U that is constructed from C and R as

follows:

(1) Let W be the r × r matrix that is the intersection of the chosen columns of C and the

chosen rows of R . That is, the element in row i and column j of W is the element of M

whose column is the j th column of C and whose row is the i th row of R .

(2) Compute the SVD of W ; say W = X Σ Y T .

(3) Compute Σ + , the Moore-Penrose pseudoinverse of the diagonal matrix Σ. That is, if

the i th diagonal element of Σ is σ ≠ 0, then replace it by 1/ σ . But if the i th element is 0,

leave it as 0.

(4) Let U = Y (Σ + ) 2 X T .

We shall defer to Section 11.4.3 an example where we illustrate the entire CUR process,

including the important matter of how the matrices C and R should be chosen to make the

approximation to M have a small expected value.

11.4.2

Choosing Rows and Columns Properly

Recall that the choice of rows and columns is random. However, this choice must be biased

so that the more important rows and columns have a better chance of being picked. The

Search WWH ::

Custom Search

Home