Big Data Analytics - Microsoft Big Data Solutions

Database Reference

In-Depth Information

5.0. You can cross-reference these IDs with the data file provided as part of

the GroupLens data set download.

Running an Item-to-item Recommendation Job

In the previous example, we used the GroupLens data set to generate

recommendations by calculating similarity between users. In this

demonstration,weinsteadusethenotionofitemsimilaritytodetermineour

item recommendations.

For this exercise, you will reuse the GroupLens data set as the format and

data requirements for the item-to-item RecommendationJob are the same.

In fact, a significant amount of overlap exists between the two jobs,

including the job parameters.

In the user-to-user example, the Mahout library uses a similarity metric to

form neighborhoods or clusters and then makes recommendations based on

reviews by statistically similar users. The item-to-item recommender takes

a different approach, instead focusing on items (or in our case, movies).

Much like the former example, the item-to-item recommender must

calculate the similarity between movies. To accomplish this, the

recommender uses both user reviews and the co-occurrence of movie

reviews by users to determine this similarity score. Using this notion of

similarity, the job can then generate recommendations based on the

provided input.

To generate item-based recommendations, follow these steps:

1. Open the Hadoop command-line console.

2. Mahout uses temporary storage for intermediate files that are output

out of intermediate MapReduce jobs. Before you can run a new Mahout

job, you need to purge the temporary directory. Use the following

command to delete the files in the temporary directory:

hadoop fs -rmr -skipTrash /user/<USER>/temp

3. Enter the Mahout item recommender job to kick off the item-based

RecommenderJob:

hadoop jar c:\mahout\mahout-core-0.7-job.jar

Search WWH ::

Custom Search

Home