Database Reference
In-Depth Information
Singular Value Decomposition
SVD seeks to decompose a matrix
X
of dimension
m x n
into three component matrices:
•
U
of dimension
m x m
•
S
, a diagonal matrix of size
m x n
; the entries of
S
are referred to as the
singular
values
•
V
T
of dimension
n x n
X = U * S * V
T
Looking at the preceding formula, it appears that we have not reduced the dimensionality
of the problem at all, as by multiplying
U
,
S
, and
V
, we reconstruct the original matrix. In
practice, the truncated SVD is usually computed. That is, only the top
k
singular values,
which represent the most variation in the data, are kept, while the rest are discarded. The
formula to reconstruct
X
based on the component matrices is then approximate:
X ~ U
k
* S
k
* V
k T
An illustration of the truncated SVD is shown here:
The truncated Singular Value Decomposition
Keeping the top
k
singular values is similar to keeping the top
k
principal components in
PCA. In fact, SVD and PCA are directly related, as we will see a little later in this chapter.
Note
A detailed mathematical treatment of both PCA and SVD is beyond the scope of this topic.
An overview of dimensionality reduction can be found in the Spark documentation at
ht-