Building a Recommendation Engine with Spark - Machine Learning with Spark

Database Reference

In-Depth Information

You will see the output as follows:

14/04/13 21:02:01 INFO MemoryStore: ensureFreeSpace(672960)

called with curMem=4006896, maxMem=311387750

14/04/13 21:02:01 INFO MemoryStore: Block broadcast_21

stored as values to memory (estimated size 657.2 KB, free

292.5 MB)

imBroadcast:

org.apache.spark.broadcast.Broadcast[org.jblas.DoubleMatrix]

= Broadcast(21)

Now we are ready to compute the recommendations for each user. We will do this by ap-

plying a map function to each user factor within which we will perform a matrix multi-

plication between the user-factor vector and the movie-factor matrix. The result is a vector

(of length 1682 , that is, the number of movies we have) with the predicted rating for each

movie. We will then sort these predictions by the predicted rating:

val allRecs = model.userFeatures.map{ case (userId, array)

=>

val userVector = new DoubleMatrix(array)

val scores = imBroadcast.value.mmul(userVector)

val sortedWithId = scores.data.zipWithIndex.sortBy(-_._1)

val recommendedIds = sortedWithId.map (_._2 + 1 ).toSeq

(userId, recommendedIds)

}

You will see the following on the screen:

allRecs: org.apache.spark.rdd.RDD[(Int, Seq[Int])] =

MappedRDD[269] at map at <console>:29

As we can see, we now have an RDD that contains a list of movie IDs for each user ID.

These movie IDs are sorted in order of the estimated rating.

Tip

Note that we needed to add 1 to the returned movie ids (as highlighted in the preceding

code snippet), as the item-factor matrix is 0-indexed, while our movie IDs start at 1 .

Search WWH ::

Custom Search

Home