Building a Recommendation Engine with Spark - Machine Learning with Spark

Database Reference

In-Depth Information

Collaborative filtering

Collaborative filtering is a form of wisdom of the crowd approach where the set of prefer-

ences of many users with respect to items is used to generate estimated preferences of users

for items with which they have not yet interacted. The idea behind this is the notion of sim-

ilarity.

In a user-based approach, if two users have exhibited similar preferences (that is, patterns

of interacting with the same items in broadly the same way), then we would assume that

they are similar to each other in terms of taste. To generate recommendations for unknown

items for a given user, we can use the known preferences of other users that exhibit similar

behavior. We can do this by selecting a set of similar users and computing some form of

combined score based on the items they have shown a preference for. The overall logic is

that if others have tastes similar to a set of items, these items would tend to be good candid-

ates for recommendation.

We can also take an item-based approach that computes some measure of similarity

between items. This is usually based on the existing user-item preferences or ratings. Items

that tend to be rated the same by similar users will be classed as similar under this ap-

proach. Once we have these similarities, we can represent a user in terms of the items they

have interacted with and find items that are similar to these known items, which we can

then recommend to the user. Again, a set of items similar to the known items is used to

generate a combined score to estimate for an unknown item.

The user- and item-based approaches are usually referred to as nearest-neighbor models,

since the estimated scores are computed based on the set of most similar users or items

(that is, their neighbors).

Finally, there are many model-based methods that attempt to model the user-item preferen-

ces themselves so that new preferences can be estimated directly by applying the model to

unknown user-item combinations.

Matrix factorization

Since Spark's recommendation models currently only include an implementation of matrix

factorization, we will focus our attention on this class of models. This focus is with good

reason; however, these types of models have consistently been shown to perform extremely

well in collaborative filtering and were among the best models in well-known competitions

such as the Netflix prize.

Search WWH ::

Custom Search

Home