Database Reference
In-Depth Information
Singular Value Decomposition
SVD seeks to decompose a matrix X of dimension m x n into three component matrices:
U of dimension m x m
S , a diagonal matrix of size m x n ; the entries of S are referred to as the singular
values
V T of dimension n x n
X = U * S * V T
Looking at the preceding formula, it appears that we have not reduced the dimensionality
of the problem at all, as by multiplying U , S , and V , we reconstruct the original matrix. In
practice, the truncated SVD is usually computed. That is, only the top k singular values,
which represent the most variation in the data, are kept, while the rest are discarded. The
formula to reconstruct X based on the component matrices is then approximate:
X ~ U k * S k * V k T
An illustration of the truncated SVD is shown here:
The truncated Singular Value Decomposition
Keeping the top k singular values is similar to keeping the top k principal components in
PCA. In fact, SVD and PCA are directly related, as we will see a little later in this chapter.
Note
A detailed mathematical treatment of both PCA and SVD is beyond the scope of this topic.
An overview of dimensionality reduction can be found in the Spark documentation at ht-
tp://spark.apache.org/docs/latest/mllib-dimensionality-reduction.html .
Search WWH ::




Custom Search