Databases Reference
In-Depth Information
where U is m × k , S is k × k , and V is k × n , the columns of U and V are
pairwise orthogonal , and S is diagonal . Note the standard statement
of SVD is slightly more involved and has U and V both square unitary
matrices, and has the middle “diagonal” matrix a rectangular. We'll be
using this form, because we're going to be taking approximations to
X of increasingly smaller rank. You can find the proof of the existence
of this form as a step in the proof of existence of the general form here .
Let's apply the preceding matrix decomposition to our situation. X is
our original dataset, which has users' ratings of items. We have m users,
n items, and k would be the rank of X , and consequently would also
be an upper bound on the number d of latent variables we decide to
care about—note we choose d whereas m , n , and k are defined through
our training dataset. So just like in k-NN, where k is a tuning parameter
(different k entirely—not trying to confuse you!), in this case, d is the
tuning parameter.
Each row of U corresponds to a user , whereas V has a row for each
item . The values along the diagonal of the square matrix S are called
the “singular values.” They measure the importance of each latent
variable—the most important latent variable has the biggest singular
value.
Important Properties of SVD
Because the columns of U and V are orthogonal to each other, you
can order the columns by singular values via a base change operation.
That way, if you put the columns in decreasing order of their corre‐
sponding singular values (which you do), then the dimensions are
ordered by importance from highest to lowest. You can take lower rank
approximation of X by throwing away part of S . In other words, re‐
place S by a submatrix taken from the upper-left corner of S .
Of course, if you cut off part of S you'd have to simultaneously cut off
part of U and part of V , but this is OK because you're cutting off the
least important vectors. This is essentially how you choose the number
of latent variables d —you no longer have the original matrix X any‐
more, only an approximation of it, because d is typically much smaller
than k , but it's still pretty close to X . This is what people mean when
they talk about “compression,” if you've ever heard that term thrown
around. There is often an important interpretation to the values in the
matrices U and V . For example, you can see, by using SVD, that the
Search WWH ::




Custom Search