Database Reference
In-Depth Information
We will use Spark's MLlib library to train our model. Let's take a look at what methods
are available for us to use and what input is required. First, import the ALS model from
MLlib:
import org.apache.spark.mllib.recommendation.ALS
On the console, we can inspect the available methods on the ALS object using tab com-
pletion. Type in ALS. (note the dot) and then press the Tab key. You should see the auto-
completion of the methods:
ALS.
asInstanceOf isInstanceOf main
toString train trainImplicit
The method we want to use is train . If we type ALS.train and hit Enter , we will get
an error. However, this error will tell us what the method signature looks like:
ALS.train
<console>:12: error: ambiguous reference to overloaded
definition,
both method train in object ALS of type (ratings:
org.apache.spark.rdd.RDD[org.apache.spark.mllib.recommendation.Rating],
rank: Int
, iterations:
Int)org.apache.spark.mllib.recommendation.MatrixFactorizationModel
and method train in object ALS of type (ratings:
org.apache.spark.rdd.RDD[org.apache.spark.mllib.recommendation.Rating],
rank: Int, iterations: Int, lambda:
Double)org.apache.spark.mllib.recommendation.MatrixFactorizationModel
match expected type ?
ALS.train
^
So, we can see that at a minimum, we need to provide the input arguments, ratings ,
rank , and iterations . The second method also requires an argument called lambda .
We'll cover these three shortly, but let's take a look at the ratings argument. First, let's
import the Rating class that it references and use a similar approach to find out what an
instance of Rating requires, by typing in Rating() and hitting Enter :
Search WWH ::




Custom Search