Database Reference
In-Depth Information
res113: Double = 1.0
Now, we are ready to apply our similarity metric to each item:
val sims = model.productFeatures.map{ case (id, factor) =>
val factorVector = new DoubleMatrix(factor)
val sim = cosineSimilarity(factorVector, itemVector)
(id, sim)
}
Next, we can compute the top 10 most similar items by sorting out the similarity score for
each item:
// recall we defined K = 10 earlier
val sortedSims = sims.top(K)(Ordering.by[(Int, Double),
Double] { case (id, similarity) => similarity })
In the preceding code snippet, we used Spark's
top
function, which is an efficient way to
compute
top-K
results in a distributed fashion, instead of using
collect
to return all the
data to the driver and sorting it locally (remember that we could be dealing with millions
of users and items in the case of recommendation models).
We need to tell Spark how to sort the
(item id, similarity score)
pairs in the
sims
RDD. To do this, we will pass an extra argument to
top
, which is a Scala
Order-
ing
object that tells Spark that it should sort by the value in the key-value pair (that is,
sort by
similarity
).
Finally, we can print the 10 items with the highest computed similarity metric to our given
item:
println(sortedSims.take(10).mkString("\n"))
You will see output like the following one:
(567,1.0000000000000002)
(1471,0.6932331537649621)
(670,0.6898690594544726)
(201,0.6897964975027041)
(343,0.6891221044611473)
(563,0.6864214133620066)