Building a Recommendation Engine with Spark - Machine Learning with Spark

Database Reference

In-Depth Information

import org.apache.spark.mllib.recommendation.Rating

Rating()

<console>:13: error: not enough arguments for method apply:

(user: Int, product: Int, rating:

Double)org.apache.spark.mllib.recommendation.Rating in

object Rating.

Unspecified value parameters user, product, rating.

Rating()

^

As we can see from the preceding output, we need to provide the ALS model with an

RDD that consists of Rating records. A Rating class, in turn, is just a wrapper around

user id , movie id (called product here), and the actual rating arguments. We'll

create our rating dataset using the map method and transforming the array of IDs and rat-

ings into a Rating object:

val ratings = rawRatings.map { case Array(user, movie,

rating) => Rating(user. toInt , movie. toInt , rating. toDouble )

}

Note

Notice that we need to use toInt or toDouble to convert the raw rating data (which

was extracted as Strings from the text file) to Int or Double numeric inputs. Also,

note the use of a case statement that allows us to extract the relevant variable names and

use them directly (this saves us from having to use something like val user = rat-

ings(0) ).

For more on Scala case statements and pattern matching as used here, take a look at

We now have an RDD[Rating] that we can verify by calling:

ratings.first()

14/03/30 12:32:48 INFO SparkContext: Starting job: first at

<console>:24

14/03/30 12:32:48 INFO DAGScheduler: Got job 2 (first at

<console>:24) with 1 output partitions (allowLocal=true)

14/03/30 12:32:48 INFO DAGScheduler: Final stage: Stage 2

Search WWH ::

Custom Search

Home