Database Reference
In-Depth Information
Computing similarity with item-factor vectors
The benefit of factorization models is the relative ease of computing recommendations
once the model is created. However, for very large user and itemsets, this can become a
challenge as it requires storage and computation across potentially many millions of user-
and item-factor vectors. Another advantage, as mentioned earlier, is that they tend to offer
very good performance.
Note
Projects such as Oryx ( https://github.com/OryxProject/oryx ) and Prediction.io ( ht-
tps://github.com/PredictionIO/PredictionIO ) focus on model serving for large-scale mod-
els, including recommenders based on matrix factorization.
On the down side, factorization models are relatively more complex to understand and in-
terpret compared to nearest-neighbor models and are often more computationally intens-
ive during the model's training phase.
Implicit matrix factorization
So far, we have dealt with explicit preferences such as ratings. However, much of the pref-
erence data that we might be able to collect is implicit feedback, where the preferences
between a user and item are not given to us, but are, instead, implied from the interactions
they might have with an item. Examples include binary data (such as whether a user
viewed a movie, whether they purchased a product, and so on) as well as count data (such
as the number of times a user watched a movie).
There are many different approaches to deal with implicit data. MLlib implements a par-
ticular approach that treats the input rating matrix as two matrices: a binary preference
matrix, P , and a matrix of confidence weights, C .
For example, let's assume that the user-movie ratings we saw previously were, in fact, the
number of times each user had viewed that movie. The two matrices would look
something like ones shown in the following screenshot. Here, the matrix P informs us that
a movie was viewed by a user, and the matrix C represents the confidence weighting, in
the form of the view counts—generally, the more a user has watched a movie, the higher
the confidence that they actually like it.
Search WWH ::




Custom Search