Dimensionality Reduction - Mining of Massive Datasets

Database Reference

In-Depth Information

Figure 11.9 SVD for the matrix M ′ of Fig. 11.8

We have used three columns for U , Σ, and V because they decompose a matrix of rank

three. The columns of U and V still correspond to concepts. The first is still “science fic-

tion” and the second is “romance.” It is harder to explain the third column's concept, but it

doesn't matter all that much, because its weight, as given by the third nonzero entry in Σ,

is very low compared with the weights of the first two concepts.

□

In the next section, we consider eliminating some of the least important concepts. For in-

stance, we might want to eliminate the third concept in Example 11.9 , since it really doesn't

tell us much, and the fact that its associated singular value is so small confirms its unim-

portance.

11.3.3

Dimensionality Reduction Using SVD

Suppose we want to represent a very large matrix M by its SVD components U , Σ, and V ,

but these matrices are also too large to store conveniently. The best way to reduce the di-

mensionality of the three matrices is to set the smallest of the singular values to zero. If we

set the s smallest singular values to 0, then we can also eliminate the corresponding s rows

of U and V .

EXAMPLE 11.10 The decomposition of Example 11.9 has three singular values. Suppose

we want to reduce the number of dimensions to two. Then we set the smallest of the singu-

lar values, which is 1.3, to zero. The effect on the expression in Fig. 11.9 is that the third

column of U and the third row of V T are multiplied only by 0's when we perform the multi-

plication, so this row and this column may as well not be there. That is, the approximation

to M ′ obtained by using only the two largest singular values is that shown in Fig. 11.10 .

Search WWH ::

Custom Search

Home