Database Reference
In-Depth Information
Note that the operations used in MLlib's ALS implementation are lazy transformations, so
the actual computation will only be performed once we call some sort of action on the res-
ulting RDDs of the user and item factors. We can force the computation using a Spark ac-
tion such as count :
model.userFeatures.count
This will trigger the computation, and we will see a quite a bit of output text similar to the
following lines of code:
14/03/30 13:10:40 INFO SparkContext: Starting job: count at
<console>:26
14/03/30 13:10:40 INFO DAGScheduler: Registering RDD 665
(map at ALS.scala:147)
14/03/30 13:10:40 INFO DAGScheduler: Registering RDD 664
(map at ALS.scala:146)
14/03/30 13:10:40 INFO DAGScheduler: Registering RDD 674
(mapPartitionsWithIndex at ALS.scala:164)
...
14/03/30 13:10:45 INFO SparkContext: Job finished: count at
<console>:26, took 5.068255 s
res16: Long = 943
If we call count for the movie factors, we will see the following output:
model.productFeatures.count
14/03/30 13:15:21 INFO SparkContext: Starting job: count at
<console>:26
14/03/30 13:15:21 INFO DAGScheduler: Got job 10 (count at
<console>:26) with 1 output partitions (allowLocal=false)
14/03/30 13:15:21 INFO DAGScheduler: Final stage: Stage 165
(count at <console>:26)
14/03/30 13:15:21 INFO DAGScheduler: Parents of final
stage: List(Stage 169, Stage 166)
14/03/30 13:15:21 INFO DAGScheduler: Missing parents: List()
14/03/30 13:15:21 INFO DAGScheduler: Submitting Stage 165
(FlatMappedRDD[883] at flatMap at ALS.scala:231), which has
no missing parents
14/03/30 13:15:21 INFO DAGScheduler: Submitting 1 missing
Search WWH ::




Custom Search