Database Reference
In-Depth Information
import org.apache.spark.mllib.recommendation.Rating
Rating()
<console>:13: error: not enough arguments for method apply:
(user: Int, product: Int, rating:
Double)org.apache.spark.mllib.recommendation.Rating in
object Rating.
Unspecified value parameters user, product, rating.
Rating()
^
As we can see from the preceding output, we need to provide the ALS model with an
RDD that consists of Rating records. A Rating class, in turn, is just a wrapper around
user id , movie id (called product here), and the actual rating arguments. We'll
create our rating dataset using the map method and transforming the array of IDs and rat-
ings into a Rating object:
val ratings = rawRatings.map { case Array(user, movie,
rating) => Rating(user. toInt , movie. toInt , rating. toDouble )
}
Note
Notice that we need to use toInt or toDouble to convert the raw rating data (which
was extracted as Strings from the text file) to Int or Double numeric inputs. Also,
note the use of a case statement that allows us to extract the relevant variable names and
use them directly (this saves us from having to use something like val user = rat-
ings(0) ).
For more on Scala case statements and pattern matching as used here, take a look at
http://docs.scala-lang.org/tutorials/tour/pattern-matching.html .
We now have an RDD[Rating] that we can verify by calling:
ratings.first()
14/03/30 12:32:48 INFO SparkContext: Starting job: first at
<console>:24
14/03/30 12:32:48 INFO DAGScheduler: Got job 2 (first at
<console>:24) with 1 output partitions (allowLocal=true)
14/03/30 12:32:48 INFO DAGScheduler: Final stage: Stage 2
Search WWH ::




Custom Search