Building a Recommendation Engine with Spark - Machine Learning with Spark

Database Reference

In-Depth Information

res113: Double = 1.0

Now, we are ready to apply our similarity metric to each item:

val sims = model.productFeatures.map{ case (id, factor) =>

val factorVector = new DoubleMatrix(factor)

val sim = cosineSimilarity(factorVector, itemVector)

(id, sim)

}

Next, we can compute the top 10 most similar items by sorting out the similarity score for

each item:

// recall we defined K = 10 earlier

val sortedSims = sims.top(K)(Ordering.by[(Int, Double),

Double] { case (id, similarity) => similarity })

In the preceding code snippet, we used Spark's top function, which is an efficient way to

compute top-K results in a distributed fashion, instead of using collect to return all the

data to the driver and sorting it locally (remember that we could be dealing with millions

of users and items in the case of recommendation models).

We need to tell Spark how to sort the (item id, similarity score) pairs in the

sims RDD. To do this, we will pass an extra argument to top , which is a Scala Order-

ing object that tells Spark that it should sort by the value in the key-value pair (that is,

sort by similarity ).

Finally, we can print the 10 items with the highest computed similarity metric to our given

item:

println(sortedSims.take(10).mkString("\n"))

You will see output like the following one:

(567,1.0000000000000002)

(1471,0.6932331537649621)

(670,0.6898690594544726)

(201,0.6897964975027041)

(343,0.6891221044611473)

(563,0.6864214133620066)

Search WWH ::

Custom Search

Home