Recommendation Systems - Mining of Massive Datasets

Databases Reference

In-Depth Information

Initialization

As we mentioned, it is essential that there be some randomness in the way we

seek an optimum solution, because the existence of many local minima justifies

our running many different optimizations in the hope of reaching the global

minimum on at least one run. We can vary the initial values of U and V , or we

can vary the way we seek the optimum (to be discussed next), or both.

A simple starting point for U and V is to give each element the same value,

and a good choice for this value is that which gives the elements of the product

U V the average of the nonblank elements of M . Note that if we have normalized

M , then this value will necessarily be 0. If we have chosen d as the lengths of

the short sides of U and V , and a is the average nonblank element of M , then

the elements of U and V should be

a/d.

If we want many starting points for U and V , then we can perturb the

value

a/d randomly and independently for each of the elements. There are

many options for how we do the perturbation. We have a choice regarding

the distribution of the difference. For example we could add to each element a

normally distributed value with mean 0 and some chosen standard deviation.

Or we could add a value uniformly chosen from the range−c to +c for some c.

Performing the Optimization

In order to reach a local minimum from a given starting value of U and V , we

need to pick an order in which we visit the elements of U and V . The simplest

thing to do is pick an order, e.g., row-by-row, for the elements of U and V ,

and visit them in round-robin fashion. Note that just because we optimized an

element once does not mean we cannot find a better value for that element after

other elements have been adjusted. Thus, we need to visit elements repeatedly,

until we have reason to believe that no further improvements are possible.

Alternatively, we can follow many different optimization paths from a single

starting value by randomly picking the element to optimize. To make sure that

every element is considered in each round, we could instead choose a permuta-

tion of the elements and follow that order for every round.

Converging to a Minimum

Ideally, at some point the RMSE becomes 0, and we know we cannot do better.

In practice, since there are normally many more nonblank elements in M than

there are elements in U and V together, we have no right to expect that we

can reduce the RMSE to 0. Thus, we have to detect when there is little benefit

to be had in revisiting elements of U and/or V . We can track the amount of

improvement in the RMSE obtained in one round of the optimization, and stop

when that improvement falls below a threshold. A small variation is to observe

the improvements resulting from the optimization of individual elements, and

stop when the maximum improvement during a round is below a threshold.

Search WWH ::

Custom Search

Home