Recommendation Engines: Building a User-Facing Data Product at Scale - Doing Data Science

Databases Reference

In-Depth Information

where U is m × k , S is k × k , and V is k × n , the columns of U and V are

pairwise orthogonal , and S is diagonal . Note the standard statement

of SVD is slightly more involved and has U and V both square unitary

matrices, and has the middle “diagonal” matrix a rectangular. We'll be

using this form, because we're going to be taking approximations to

X of increasingly smaller rank. You can find the proof of the existence

of this form as a step in the proof of existence of the general form here .

Let's apply the preceding matrix decomposition to our situation. X is

our original dataset, which has users' ratings of items. We have m users,

n items, and k would be the rank of X , and consequently would also

be an upper bound on the number d of latent variables we decide to

care about—note we choose d whereas m , n , and k are defined through

our training dataset. So just like in k-NN, where k is a tuning parameter

(different k entirely—not trying to confuse you!), in this case, d is the

tuning parameter.

Each row of U corresponds to a user , whereas V has a row for each

item . The values along the diagonal of the square matrix S are called

the “singular values.” They measure the importance of each latent

variable—the most important latent variable has the biggest singular

value.

Important Properties of SVD

Because the columns of U and V are orthogonal to each other, you

can order the columns by singular values via a base change operation.

That way, if you put the columns in decreasing order of their corre‐

sponding singular values (which you do), then the dimensions are

ordered by importance from highest to lowest. You can take lower rank

approximation of X by throwing away part of S . In other words, re‐

place S by a submatrix taken from the upper-left corner of S .

Of course, if you cut off part of S you'd have to simultaneously cut off

part of U and part of V , but this is OK because you're cutting off the

least important vectors. This is essentially how you choose the number

of latent variables d —you no longer have the original matrix X any‐

more, only an approximation of it, because d is typically much smaller

than k , but it's still pretty close to X . This is what people mean when

they talk about “compression,” if you've ever heard that term thrown

around. There is often an important interpretation to the values in the

matrices U and V . For example, you can see, by using SVD, that the

Search WWH ::

Custom Search

Home