Databases Reference
In-Depth Information
Assume for now we only modify with determinant 1 matrices G ; i.e.,
we restrict ourselves to volume-preserving transformations. If we ig‐
nore for now the size of the entries of V and concentrate only on the
size of the entries of U , we are minimizing the surface area of a d -
dimensional parallelepiped in n space (specifically, the one generated
by the columns of U ) where the volume is fixed. This is achieved by
making the sides of the parallelepiped mutually orthogonal, which is
the same as saying the latent features will be uncorrelated.
But don't forget, we've ignored V ! However, it turns out that V 's rows
will also be mutually orthogonal when we force U 's columns to be.
This is not hard to see if you keep in mind X has its SVD as discussed
previously. In fact, the SVD and this form U · V have a lot in common,
and some people just call this an SVD algorithm, even though it's not
quite.
Now we allow modifications with nontrivial determinant—so, for ex‐
ample, let G be some scaled version of the identity matrix. Then if we
do a bit of calculus, it turns out that the best choice of scalar (i.e., to
minimize the sum of the squares of the entries of U and of V ) is in fact
the geometric mean of those two quantities, which is cool. In other
words, we're minimizing the arithmetic mean of them with a single
parameter (the scalar) and the answer is the geometric mean.
So that's the proof. Believe us?
Alternating Least Squares
But how do you do this? How do you actually find U and V ? In reality,
as you will see next, you're not first minimizing the squared error and
then minimizing the size of the entries of the matrices U and V . You're
actually doing both at the same time.
So your goal is to find U and V by solving the optimization problem
described earlier. This optimization doesn't have a nice closed formula
like ordinary least squares with one set of coefficients. Instead, you
need an iterative algorithm like gradient descent. As long as your
problem is convex you'll converge OK (i.e., you won't find yourself at
a local, but not global, maximum), and you can force your problem to
be convex using regularization.
Here's the algorithm:
• Pick a random V .
Search WWH ::




Custom Search