Building a Recommendation Engine with Spark - Machine Learning with Spark

Database Reference

In-Depth Information

Note that the operations used in MLlib's ALS implementation are lazy transformations, so

the actual computation will only be performed once we call some sort of action on the res-

ulting RDDs of the user and item factors. We can force the computation using a Spark ac-

tion such as count :

model.userFeatures.count

This will trigger the computation, and we will see a quite a bit of output text similar to the

following lines of code:

14/03/30 13:10:40 INFO SparkContext: Starting job: count at

<console>:26

14/03/30 13:10:40 INFO DAGScheduler: Registering RDD 665

(map at ALS.scala:147)

14/03/30 13:10:40 INFO DAGScheduler: Registering RDD 664

(map at ALS.scala:146)

14/03/30 13:10:40 INFO DAGScheduler: Registering RDD 674

(mapPartitionsWithIndex at ALS.scala:164)

...

14/03/30 13:10:45 INFO SparkContext: Job finished: count at

<console>:26, took 5.068255 s

res16: Long = 943

If we call count for the movie factors, we will see the following output:

model.productFeatures.count

14/03/30 13:15:21 INFO SparkContext: Starting job: count at

<console>:26

14/03/30 13:15:21 INFO DAGScheduler: Got job 10 (count at

<console>:26) with 1 output partitions (allowLocal=false)

14/03/30 13:15:21 INFO DAGScheduler: Final stage: Stage 165

(count at <console>:26)

14/03/30 13:15:21 INFO DAGScheduler: Parents of final

stage: List(Stage 169, Stage 166)

14/03/30 13:15:21 INFO DAGScheduler: Missing parents: List()

14/03/30 13:15:21 INFO DAGScheduler: Submitting Stage 165

(FlatMappedRDD[883] at flatMap at ALS.scala:231), which has

no missing parents

14/03/30 13:15:21 INFO DAGScheduler: Submitting 1 missing

Search WWH ::

Custom Search

Home