Databases Reference
In-Depth Information
Initialization
As we mentioned, it is essential that there be some randomness in the way we
seek an optimum solution, because the existence of many local minima justifies
our running many different optimizations in the hope of reaching the global
minimum on at least one run. We can vary the initial values of U and V , or we
can vary the way we seek the optimum (to be discussed next), or both.
A simple starting point for U and V is to give each element the same value,
and a good choice for this value is that which gives the elements of the product
U V the average of the nonblank elements of M . Note that if we have normalized
M , then this value will necessarily be 0. If we have chosen d as the lengths of
the short sides of U and V , and a is the average nonblank element of M , then
the elements of U and V should be
a/d.
If we want many starting points for U and V , then we can perturb the
value
a/d randomly and independently for each of the elements. There are
many options for how we do the perturbation. We have a choice regarding
the distribution of the difference. For example we could add to each element a
normally distributed value with mean 0 and some chosen standard deviation.
Or we could add a value uniformly chosen from the range−c to +c for some c.
Performing the Optimization
In order to reach a local minimum from a given starting value of U and V , we
need to pick an order in which we visit the elements of U and V . The simplest
thing to do is pick an order, e.g., row-by-row, for the elements of U and V ,
and visit them in round-robin fashion. Note that just because we optimized an
element once does not mean we cannot find a better value for that element after
other elements have been adjusted. Thus, we need to visit elements repeatedly,
until we have reason to believe that no further improvements are possible.
Alternatively, we can follow many different optimization paths from a single
starting value by randomly picking the element to optimize. To make sure that
every element is considered in each round, we could instead choose a permuta-
tion of the elements and follow that order for every round.
Converging to a Minimum
Ideally, at some point the RMSE becomes 0, and we know we cannot do better.
In practice, since there are normally many more nonblank elements in M than
there are elements in U and V together, we have no right to expect that we
can reduce the RMSE to 0. Thus, we have to detect when there is little benefit
to be had in revisiting elements of U and/or V . We can track the amount of
improvement in the RMSE obtained in one round of the optimization, and stop
when that improvement falls below a threshold. A small variation is to observe
the improvements resulting from the optimization of individual elements, and
stop when the maximum improvement during a round is below a threshold.
Search WWH ::




Custom Search