Database Reference
In-Depth Information
11
Dimensionality Reduction
There are many sources of data that can be viewed as a large matrix. We saw in Chapter 5
how the Web can be represented as a transition matrix. In Chapter 9 , the utility matrix was
a point of focus. And in Chapter 10 we examined matrices that represent social networks.
In many of these matrix applications, the matrix can be summarized by finding “narrow-
er” matrices that in some sense are close to the original. These narrow matrices have only a
small number of rows or a small number of columns, and therefore can be used much more
efficiently than can the original large matrix. The process of finding these narrow matrices
is called dimensionality reduction .
We saw a preliminary example of dimensionality reduction in Section 9.4 . There, we dis-
cussed UV-decomposition of a matrix and gave a simple algorithm for finding this decom-
position. Recall that a large matrix M was decomposed into two matrices U and V whose
product UV was approximately M . The matrix U had a small number of columns whereas V
had a small number of rows, so each was significantly smaller than M , and yet together they
represented most of the information in M that was useful in predicting ratings of items by
individuals.
In this chapter we shall explore the idea of dimensionality reduction in more detail. We
begin with a discussion of eigenvalues and their use in “principal component analysis”
(PCA). We cover singular-value decomposition, a more powerful version of UV-decompos-
ition. Finally, because we are always interested in the largest data sizes we can handle, we
look at another form of decomposition, called CUR-decomposition, which is a variant of
singular-value decomposition that keeps the matrices of the decomposition sparse if the ori-
ginal matrix is sparse.
11.1 Eigenvalues and Eigenvectors
We shall assume that you are familiar with the basics of matrix algebra: multiplication, trans-
pose, determinants, and solving linear equations for example. In this section, we shall define
eigenvalues and eigenvectors of a symmetric matrix and show how to find them. Recall a
Search WWH ::




Custom Search