Database Reference
In-Depth Information
Why the Pseudoinverse Works
In general suppose a matrix M is equal to a product of matrices XZY . If all the inverses exist, then the rule for inverse
of a product tell us M −1 = Y −1 Z −1 X −1 . Since in the case we are interested in, XZY is an SVD, we know X is column-or-
thonormal and Y is row-orthonormal. In either of these cases, the inverse and the transpose are the same. That is, XX T
is an identity matrix of the appropriate size, and so is Y Y T . Thus, M −1 = Y T Z −1 X T .
We also know Z is a diagonal matrix. If there are no 0's along the diagonal, then Z −1 is formed from Z by taking the
numerical inverse of each diagonal element. It is only when there are 0's along the diagonal of Z that we are unable
to find an element for the same position in the inverse such that we can get an identity matrix when we multiply Z by
its inverse. That is why we resort to a “pseudoinverse,” accepting the fact that the product ZZ + will not be an identity
matrix, but rather a diagonal matrix where the i th diagonal entry is 1 if the i th element of Z is nonzero and 0 if the i th
element of Z is 0.
11.4.1
Definition of CUR
Let M be a matrix of m rows and n columns. Pick a target number of “concepts” r to be used
in the decomposition. A CUR-decomposition of M is a randomly chosen set of r columns
of M , which form the m × r matrix C , and a randomly chosen set of r rows of M , which
form the r × n matrix R . There is also an r × r matrix U that is constructed from C and R as
follows:
(1) Let W be the r × r matrix that is the intersection of the chosen columns of C and the
chosen rows of R . That is, the element in row i and column j of W is the element of M
whose column is the j th column of C and whose row is the i th row of R .
(2) Compute the SVD of W ; say W = X Σ Y T .
(3) Compute Σ + , the Moore-Penrose pseudoinverse of the diagonal matrix Σ. That is, if
the i th diagonal element of Σ is σ ≠ 0, then replace it by 1/ σ . But if the i th element is 0,
leave it as 0.
(4) Let U = Y + ) 2 X T .
We shall defer to Section 11.4.3 an example where we illustrate the entire CUR process,
including the important matter of how the matrices C and R should be chosen to make the
approximation to M have a small expected value.
11.4.2
Choosing Rows and Columns Properly
Recall that the choice of rows and columns is random. However, this choice must be biased
so that the more important rows and columns have a better chance of being picked. The
Search WWH ::




Custom Search