Big Data Analytics - Microsoft Big Data Solutions

Database Reference

In-Depth Information

• Spearman correlation

• Cosine

• Tanimoto coefficient

• Log-likelihood

• Pearson correlation

After, the job is started, it will take between 15 and 20 minutes to run if you

are using a four-node cluster. During this time, a series of MapReduce jobs

are being run to process the data and generate the movie recommendations.

When the job has completed, you can view the various outputted files using

the following command:

hadoop fs -ls /user/<YOUR USERNAME>/chapter15/output/

userrecommendations

You can find the generated recommendations in the part-r-00000 file.

To export the file from HDFS to your local file system, use the following

command:

hadoop fs -copyToLocal

/user/<YOUR USERNAME>/chapter15/output/

userrecommendations/part-r-00000

c:\<LOCAL OUPUT DIRECTORY>\recommendations.csv

You can review the file to find the recommendation generated for each user.

The output from the recommendation job takes the following format:

UserID [ItemID:Estimate Rating, ………]

An example of the output is shown here:

1

[1566:5.0,1036:5.0,1033:5.0,1032:5.0,1031:5.0,3107:5.0]

In this example, for the user identified by the ID of 1, we would recommend

the movies identified by the IDs 1566 ( The Man from Down Under ), 1036

( Drop Dead Fred) , 1033 ( Homeward Bound II: Lost in San Francisco ), and

so on. The estimated ratings for each of these movies for this specific user is

Search WWH ::

Custom Search

Home