Building a Recommendation Engine with Spark - Machine Learning with Spark

Database Reference

In-Depth Information

Rating(789,134,5.278933936827717)

Rating(789,156,5.250959077906759)

Rating(789,432,5.169863417126231)

Inspecting the recommendations

We can give these recommendations a sense check by taking a quick look at the titles of

the movies a user has rated and the recommended movies. First, we need to load the

movie data (which is the one of the datasets we explored in the previous chapter). We'll

collect this data as a Map[Int, String] method mapping the movie ID to the title:

val movies = sc.textFile("/PATH/ml-100k/u.item")

val titles = movies.map(line =>

line.split("\\|").take(2)).map(array =>

(array(0).toInt,array(1))).collectAsMap()

titles(123)

The preceding code will produce the output as:

res68: String = Frighteners, The (1996)

For our user 789 , we can find out what movies they have rated, take the 10 movies with

the highest rating, and then check the titles. We will do this now by first using the keyBy

Spark function to create an RDD of key-value pairs from our ratings RDD, where the

key will be the user ID. We will then use the lookup function to return just the ratings

for this key (that is, that particular user ID) to the driver:

val moviesForUser = ratings.keyBy(_.user).lookup(789)

Let's see how many movies this user has rated. This will be the size of the

moviesForUser collection:

println(moviesForUser.size)

We will see that this user has rated 33 movies.

Next, we will take the 10 movies with the highest ratings by sorting the

moviesForUser collection using the rating field of the Rating object. We will

then extract the movie title for the relevant product ID attached to the Rating class from

our mapping of movie titles and print out the top 10 titles with their ratings:

Search WWH ::

Custom Search

Home