Database Reference
In-Depth Information
We will use Spark's MLlib library to train our model. Let's take a look at what methods
are available for us to use and what input is required. First, import the
ALS
model from
MLlib:
import org.apache.spark.mllib.recommendation.ALS
On the console, we can inspect the available methods on the ALS object using tab com-
pletion. Type in
ALS.
(note the dot) and then press the
Tab
key. You should see the auto-
completion of the methods:
ALS.
asInstanceOf isInstanceOf main
toString train trainImplicit
The method we want to use is
train
. If we type
ALS.train
and hit
Enter
, we will get
an error. However, this error will tell us what the method signature looks like:
ALS.train
<console>:12: error: ambiguous reference to overloaded
definition,
both method train in object ALS of type (ratings:
org.apache.spark.rdd.RDD[org.apache.spark.mllib.recommendation.Rating],
rank: Int
, iterations:
Int)org.apache.spark.mllib.recommendation.MatrixFactorizationModel
and method train in object ALS of type (ratings:
org.apache.spark.rdd.RDD[org.apache.spark.mllib.recommendation.Rating],
rank: Int, iterations: Int, lambda:
Double)org.apache.spark.mllib.recommendation.MatrixFactorizationModel
match expected type ?
ALS.train
^
So, we can see that at a minimum, we need to provide the input arguments,
ratings
,
rank
, and
iterations
. The second method also requires an argument called
lambda
.
We'll cover these three shortly, but let's take a look at the
ratings
argument. First, let's
import the
Rating
class that it references and use a similar approach to find out what an
instance of
Rating
requires, by typing in
Rating()
and hitting
Enter
: