Building a Recommendation Engine with Spark - Machine Learning with Spark

Database Reference

In-Depth Information

We will use Spark's MLlib library to train our model. Let's take a look at what methods

are available for us to use and what input is required. First, import the ALS model from

MLlib:

import org.apache.spark.mllib.recommendation.ALS

On the console, we can inspect the available methods on the ALS object using tab com-

pletion. Type in ALS. (note the dot) and then press the Tab key. You should see the auto-

completion of the methods:

ALS.

asInstanceOf isInstanceOf main

toString train trainImplicit

The method we want to use is train . If we type ALS.train and hit Enter , we will get

an error. However, this error will tell us what the method signature looks like:

ALS.train

<console>:12: error: ambiguous reference to overloaded

definition,

both method train in object ALS of type (ratings:

org.apache.spark.rdd.RDD[org.apache.spark.mllib.recommendation.Rating],

rank: Int

, iterations:

Int)org.apache.spark.mllib.recommendation.MatrixFactorizationModel

and method train in object ALS of type (ratings:

org.apache.spark.rdd.RDD[org.apache.spark.mllib.recommendation.Rating],

rank: Int, iterations: Int, lambda:

Double)org.apache.spark.mllib.recommendation.MatrixFactorizationModel

match expected type ?

ALS.train

^

So, we can see that at a minimum, we need to provide the input arguments, ratings ,

rank , and iterations . The second method also requires an argument called lambda .

We'll cover these three shortly, but let's take a look at the ratings argument. First, let's

import the Rating class that it references and use a similar approach to find out what an

instance of Rating requires, by typing in Rating() and hitting Enter :

Search WWH ::

Custom Search

Home