Big Data Analytics - Microsoft Big Data Solutions

Database Reference

In-Depth Information

Running a User-to-user Recommendation Job

The first job that we will run will use collaborative filtering to generate

user-to-user recommendations for the MovieLens data set. Be sure before

continuing that you have deployed the Mahout jar files to your HDInsight

cluster before continuing.

Before starting the job, let's delve into what's going to happen once we start

the recommendation job. First, we will iterate through each user, finding

movies that the user has not previously rated. Next, for each movie that the

user has not yet reviewed, we will find other users who have reviewed the

movie. We will use a statistical measure to calculate the similarity between

the two users and then use the similarity to estimate the preference in the

form of a weighted average.

At a high level, we can distill this logic of the recommendation job to the

following pseudo-code:

for each item i that u has no preference

for each user v that has a preference for i

compute similarity s between u and v

calculate running average of v's

preference for i,

weighted by s

return top ranked (weighted average) i

Note that this is a drastic simplification of what is really occurring behind

the scenes. In fact, we've omitted a key step. If the preceding logic were

implementedasis,itwouldnotperformefficientlyandwouldsufferatscale.

Toremedythis,wecouldintroducetheconceptofneighborhoodsorclusters

of similar users to limit the number of similarity comparisons that need to

be made.

A detailed explanation of how this works is beyond the scope of this topic,

but you can find ample material about how Mahout handles neighborhood

formation and similarity calculations on the Mahout website.

With a high-level understanding of how user-to-user recommendations are

generated, use the hadoop jar command at the Hadoop command line to

start the RecommenderJob :

Search WWH ::

Custom Search

Home