Database Reference
In-Depth Information
Running a User-to-user Recommendation Job
The first job that we will run will use collaborative filtering to generate
user-to-user recommendations for the MovieLens data set. Be sure before
continuing that you have deployed the Mahout jar files to your HDInsight
cluster before continuing.
Before starting the job, let's delve into what's going to happen once we start
the recommendation job. First, we will iterate through each user, finding
movies that the user has not previously rated. Next, for each movie that the
user has not yet reviewed, we will find other users who have reviewed the
movie. We will use a statistical measure to calculate the similarity between
the two users and then use the similarity to estimate the preference in the
form of a weighted average.
At a high level, we can distill this logic of the recommendation job to the
following pseudo-code:
for each item i that u has no preference
for each user v that has a preference for i
compute similarity s between u and v
calculate running average of v's
preference for i,
weighted by s
return top ranked (weighted average) i
Note that this is a drastic simplification of what is really occurring behind
the scenes. In fact, we've omitted a key step. If the preceding logic were
implementedasis,itwouldnotperformefficientlyandwouldsufferatscale.
Toremedythis,wecouldintroducetheconceptofneighborhoodsorclusters
of similar users to limit the number of similarity comparisons that need to
be made.
A detailed explanation of how this works is beyond the scope of this topic,
but you can find ample material about how Mahout handles neighborhood
formation and similarity calculations on the Mahout website.
With a high-level understanding of how user-to-user recommendations are
generated, use the hadoop jar command at the Hadoop command line to
start the RecommenderJob :
Search WWH ::




Custom Search