Database Reference
In-Depth Information
res113: Double = 1.0
Now, we are ready to apply our similarity metric to each item:
val sims = model.productFeatures.map{ case (id, factor) =>
val factorVector = new DoubleMatrix(factor)
val sim = cosineSimilarity(factorVector, itemVector)
(id, sim)
}
Next, we can compute the top 10 most similar items by sorting out the similarity score for
each item:
// recall we defined K = 10 earlier
val sortedSims = sims.top(K)(Ordering.by[(Int, Double),
Double] { case (id, similarity) => similarity })
In the preceding code snippet, we used Spark's top function, which is an efficient way to
compute top-K results in a distributed fashion, instead of using collect to return all the
data to the driver and sorting it locally (remember that we could be dealing with millions
of users and items in the case of recommendation models).
We need to tell Spark how to sort the (item id, similarity score) pairs in the
sims RDD. To do this, we will pass an extra argument to top , which is a Scala Order-
ing object that tells Spark that it should sort by the value in the key-value pair (that is,
sort by similarity ).
Finally, we can print the 10 items with the highest computed similarity metric to our given
item:
println(sortedSims.take(10).mkString("\n"))
You will see output like the following one:
(567,1.0000000000000002)
(1471,0.6932331537649621)
(670,0.6898690594544726)
(201,0.6897964975027041)
(343,0.6891221044611473)
(563,0.6864214133620066)
Search WWH ::




Custom Search