Database Reference
In-Depth Information
import org.apache.spark.mllib.recommendation.Rating
Rating()
<console>:13: error: not enough arguments for method apply:
(user: Int, product: Int, rating:
Double)org.apache.spark.mllib.recommendation.Rating in
object Rating.
Unspecified value parameters user, product, rating.
Rating()
^
As we can see from the preceding output, we need to provide the
ALS
model with an
RDD that consists of
Rating
records. A
Rating
class, in turn, is just a wrapper around
user id
,
movie id
(called
product
here), and the actual
rating
arguments. We'll
create our rating dataset using the
map
method and transforming the array of IDs and rat-
ings into a
Rating
object:
val ratings = rawRatings.map { case Array(user, movie,
rating) => Rating(user.
toInt
, movie.
toInt
, rating.
toDouble
)
}
Note
Notice that we need to use
toInt
or
toDouble
to convert the raw rating data (which
was extracted as
Strings
from the text file) to
Int
or
Double
numeric inputs. Also,
note the use of a
case
statement that allows us to extract the relevant variable names and
use them directly (this saves us from having to use something like
val user = rat-
ings(0)
).
For more on Scala case statements and pattern matching as used here, take a look at
We now have an
RDD[Rating]
that we can verify by calling:
ratings.first()
14/03/30 12:32:48 INFO SparkContext: Starting job: first at
<console>:24
14/03/30 12:32:48 INFO DAGScheduler: Got job 2 (first at
<console>:24) with 1 output partitions (allowLocal=true)
14/03/30 12:32:48 INFO DAGScheduler: Final stage: Stage 2