Database Reference
In-Depth Information
The preceding code will provide the following output:
Map(2 -> Adventure, 5 -> Comedy, 12 -> Musical, 15 ->
Sci-Fi, 8 -> Drama, 18 -> Western, ...
Next, we'll create a new RDD from the movie data and our genre mapping; this RDD con-
tains the movie ID, title, and genres. We will use this later to create a more readable out-
put when we evaluate the clusters assigned to each movie by our clustering model.
In the following code section, we will map over each movie and extract the genres sub-
vector (which will still contain Strings rather than Int indexes). We will then apply
the zipWithIndex method to create a new collection that contains the indices of the
genre subvector, and we will filter this collection so that we are left only with the positive
assignments (that is, the 1s that denote a genre assignment for the relevant index). We can
then use our extracted genre mapping to map these indices to the textual genres. Finally,
we will inspect the first record of the new RDD to see the result of these operations:
val titlesAndGenres = movies.map(_.split("\\|")).map {
array =>
val genres = array.toSeq.slice(5, array.size)
val genresAssigned = genres.zipWithIndex.filter { case
(g, idx) =>
g == "1"
}.map { case (g, idx) =>
genreMap(idx.toString)
}
(array(0).toInt, (array(1), genresAssigned))
}
println(titlesAndGenres.first)
This should output the following result:
(1,(Toy Story (1995),ArrayBuffer(Animation, Children's,
Comedy)))
Training the recommendation model
To get the user and movie factor vectors, we first need to train another recommendation
model. Fortunately, we have already done this in Chapter 4 , Building a Recommendation
Engine with Spark , so we will follow the same procedure:
Search WWH ::




Custom Search