Database Reference
In-Depth Information
Rating(789,134,5.278933936827717)
Rating(789,156,5.250959077906759)
Rating(789,432,5.169863417126231)
Inspecting the recommendations
We can give these recommendations a sense check by taking a quick look at the titles of
the movies a user has rated and the recommended movies. First, we need to load the
movie data (which is the one of the datasets we explored in the previous chapter). We'll
collect this data as a
Map[Int, String]
method mapping the movie ID to the title:
val movies = sc.textFile("/PATH/ml-100k/u.item")
val titles = movies.map(line =>
line.split("\\|").take(2)).map(array =>
(array(0).toInt,array(1))).collectAsMap()
titles(123)
The preceding code will produce the output as:
res68: String = Frighteners, The (1996)
For our user
789
, we can find out what movies they have rated, take the
10
movies with
the highest rating, and then check the titles. We will do this now by first using the
keyBy
Spark function to create an RDD of key-value pairs from our
ratings
RDD, where the
key will be the user ID. We will then use the
lookup
function to return just the ratings
for this key (that is, that particular user ID) to the driver:
val moviesForUser = ratings.keyBy(_.user).lookup(789)
Let's see how many movies this user has rated. This will be the
size
of the
moviesForUser
collection:
println(moviesForUser.size)
We will see that this user has rated
33
movies.
Next, we will take the 10 movies with the highest ratings by sorting the
moviesForUser
collection using the
rating
field of the
Rating
object. We will
then extract the movie title for the relevant product ID attached to the
Rating
class from
our mapping of movie titles and print out the top
10
titles with their ratings: