Database Reference
In-Depth Information
11
Dimensionality Reduction
There are many sources of data that can be viewed as a large matrix. We saw in
Chapter 5
how the Web can be represented as a transition matrix. In
Chapter 9
, the utility matrix was
a point of focus. And in
Chapter 10
we examined matrices that represent social networks.
In many of these matrix applications, the matrix can be summarized by finding “narrow-
er” matrices that in some sense are close to the original. These narrow matrices have only a
small number of rows or a small number of columns, and therefore can be used much more
efficiently than can the original large matrix. The process of finding these narrow matrices
is called
dimensionality reduction
.
We saw a preliminary example of dimensionality reduction in
Section 9.4
. There, we dis-
cussed UV-decomposition of a matrix and gave a simple algorithm for finding this decom-
position. Recall that a large matrix
M
was decomposed into two matrices
U
and
V
whose
product
UV
was approximately
M
. The matrix
U
had a small number of columns whereas
V
had a small number of rows, so each was significantly smaller than
M
, and yet together they
represented most of the information in
M
that was useful in predicting ratings of items by
individuals.
In this chapter we shall explore the idea of dimensionality reduction in more detail. We
begin with a discussion of eigenvalues and their use in “principal component analysis”
(PCA). We cover singular-value decomposition, a more powerful version of UV-decompos-
ition. Finally, because we are always interested in the largest data sizes we can handle, we
look at another form of decomposition, called CUR-decomposition, which is a variant of
singular-value decomposition that keeps the matrices of the decomposition sparse if the ori-
ginal matrix is sparse.
11.1 Eigenvalues and Eigenvectors
We shall assume that you are familiar with the basics of matrix algebra: multiplication, trans-
pose, determinants, and solving linear equations for example. In this section, we shall define
eigenvalues and eigenvectors of a symmetric matrix and show how to find them. Recall a